-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Working with a bulk update, we noticed on a table with 7 million items there were 5 million with the same PK value. The segment having that PK value became the long pole in the execution time tent.
When we artificially made it so no updates were detected as being needed, this segment would process at about 300 RCUs, which was 2.4 MB/sec or about 5,000 items per second. It would take about 19 minutes to read through it.
When updates were needed on every item though, it only performed at about 200 UpdateItem calls per second, probably due to 5ms round trip times on UpdateItem calls. That rate required about 7 hours to process the segment.
There is no BatchUpdateItem
API call. Without that, to scale up we need to have more client threads making the UpdateItem API call. We could change so instead of having the same worker that's doing the scan do the updates right away, we could have each worker doing scans return the "commands" to be performed, scatter those commands via 'repartition' to the various workers, then run those commands in mass parallel across the workers.
If the items with the same PK were all in the same partition, this would speed things from 200/sec to 1,000/sec (assuming small items, limited only by per-partition write rates). If the PK value had been previously allocated across 10 partitions (as was the case in the above test), this would go from 200/sec (limited by how fast one client thread can loop) to 10,000/sec (limited by how fast the 10 partitions can perform writes).
You'd test it by generating a table with a majority of items having a single PK value and making sure we still get good parallelization for the update calls.