Skip to content

Have update do a scan phase then update phase to spread writes #87

@hunterhacker

Description

@hunterhacker

Working with a bulk update, we noticed on a table with 7 million items there were 5 million with the same PK value. The segment having that PK value became the long pole in the execution time tent.

When we artificially made it so no updates were detected as being needed, this segment would process at about 300 RCUs, which was 2.4 MB/sec or about 5,000 items per second. It would take about 19 minutes to read through it.

When updates were needed on every item though, it only performed at about 200 UpdateItem calls per second, probably due to 5ms round trip times on UpdateItem calls. That rate required about 7 hours to process the segment.

There is no BatchUpdateItem API call. Without that, to scale up we need to have more client threads making the UpdateItem API call. We could change so instead of having the same worker that's doing the scan do the updates right away, we could have each worker doing scans return the "commands" to be performed, scatter those commands via 'repartition' to the various workers, then run those commands in mass parallel across the workers.

If the items with the same PK were all in the same partition, this would speed things from 200/sec to 1,000/sec (assuming small items, limited only by per-partition write rates). If the PK value had been previously allocated across 10 partitions (as was the case in the above test), this would go from 200/sec (limited by how fast one client thread can loop) to 10,000/sec (limited by how fast the 10 partitions can perform writes).

You'd test it by generating a table with a majority of items having a single PK value and making sure we still get good parallelization for the update calls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bulk_executorAll bulk executor tasksenhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions