-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Today scan-oriented commands (scancount
, diff
) each are given a segment to use and do all their reading from that segment. It tends to direct read traffic to the same linear keyspace area and thus the same small set of partitions. Today we often rate limit a task to 1,500 read units per second to avoid creating overly hot partitions.
An improvement would be to use shuffle scans, where the task is handed N many segments and reads from them round-robin. That would spread the task's read traffic more fairly across more partitions.
This would also help with write commands (update
) which today are driven by scans and would benefit from updating more partitions from the same task. Then we could raise the 500 write units per second as well.
Bottom line: pass to each task not a segment to use but a set of segments that are used within the task round-robin.
Today:
1,000 segments, 1,000 tasks each with 1 segment, 50 executors, so 50 segments being hit
Better:
1,000 segments, 200 tasks each with 5 randomly selected segments, 50 executors, so 250 segments being hit
Now, this makes each task take longer because each task is responsible for 5x as much table space. That can be solved by doing 5,000 segments instead, back to 1,000 tasks, with 5 random segments per task. Each task still does 0.1% of the table but divided into 5 disjointed chunks instead of 1 contiguous chunk.
In reality I might suggest sets of 10 segments per task, not actually 5.