Use hashbrown hashmap/hashset in validation context #8606

alexggh · 2025-05-22T06:52:20Z

Discovered while profiling #6131 (comment) with the benchmark #8069 that when running in validation a big chunk of the time is spent inserting and retrieving data from the BTreeMap/BTreeSet.

By switching to hashbrown HashMap/HashSet in validation TrieCache and TrieRecorder and the memory-db paritytech/trie#221 read costs improve with around ~40% and write with about ~20%

Signed-off-by: Alexandru Gheorghe <[email protected]>

skunert

Nice!

athei

Should be safe since the keys are already derived via a cryptographic hash function from user data.

…sing_hashbrown

alexggh · 2025-05-23T13:03:34Z

/cmd prdoc --audience node_dev --bump patch

…e_dev --bump patch'

Signed-off-by: Alexandru Gheorghe <[email protected]>

burdges · 2025-05-23T22:24:45Z

If code runs in native, then HashMap uses secret system randomness, which makes it secure. We typically use BTreeMap instead of HashMap for user controlled keys inside the runtime because there is no system randomness, so adversaries could easily find hash collisions that cause slow probing.

A cryprtographic hash function only provides collision resistance at cryptographic sizes, but if you only have say 2^32 elements in the HashMap, then one can easily find thousands of blake2_128 collisions of that size, which then causes slowdows. If the adversary needs real accounts, then the elliptic curve scalar multiplications slows them down slightly, but does not prevent the attack, and if dust accounts suffice then they can avoid this by using a fast hash-to-curve.

cc @jakoblell

Alternative 1.

We could analyize this attach if the block hash were treated as "system" randomness for the HashMap. We do not have fiat shamir of course but then the blake2_128 gives us something: If we didn't have the blake2_128 then the adversary could compute the blockchain inside some contract and create dust accounts that collided. I think our blake2_128 means the adversary cannot run that improved attack without some auxiliary input external to the block, not allowed in our model.

If this works, then the adverary can only bruit force the whole block hash to pick a somewhat concentrated distribution, which likely reduces the size of attackable HashMaps, maybe or maaybe not enough.

Alternative 2.

As a "blockchain" solution, we could develop a cuckoo table say called PerfectMap, which mostly works like HashMap, except it always runs "perfectly" on the validator or else it panics.

At the collator, PerfectMap records its creation order and instantiates its hasher using local system randomness r, but all keys inserts and deletions get added to a replay log, which only runs on collators. Anytime an insertion fails PerfectMap simply resamples fresh system randomness r, enlarges itself if fuller than say 80%, and then replays the entire block creation process so far its replay log, possibly triggering even more restarts. PerfectMap::Drop logs the final size and final r in PerfectMap creation order in a perfectmap_log the PoV data.

At the validator, PerfectMap pulls the next size and r from perfectmap_log in the PoV data, allocates itself from this size, and instantiates its hasher using r. Any insertion failures simply panic, makign the block invalid. Any runtime code other than PerfectMap that reads the perfectmap_log could easily cause panics, so contracts must not have access to perfectmap_log.

A cuckoo table seems easier to analyize since they already have capped probing. Also they save some memory. You could cap the probing in a HashMap too, doing so costs some memory.

* master: (99 commits) Snowbridge: Remove asset location check for compatibility (#8473) add poke_deposit extrinsic to pallet-bounties (#8382) litep2p/peerset: Reject non-reserved peers in the reserved-only mode (#8650) Charge deposit based on key length (#8648) [pallet-revive] make subscription task panic on error (#8587) tx/metrics: Add metrics for the RPC v2 `transactionWatch_v1_submitAndWatch` (#8345) Bridges: Fix - Improve try-state for pallet-xcm-bridge-hub (#8615) Introduce CreateBare, deprecated CreateInherent (#7597) Use hashbrown hashmap/hashset in validation context (#8606) ci: rm gitlab config (#8622) 🔪 flaky and Zombienet tests (#8600) cumulus: adjust unincluded segment size metric buckets (#8617) Benchmark storage access on block validation (#8069) Revert 7934 es/remove tj changes (#8611) collator-protocol: add more collation observability (#8230) `fatxpool`: add fallback for ready at light (#8533) txpool: fix tx removal from unlocks set (#8500) XCMP weight metering: account for the MQ page position (#8344) fix epmb solution duplicate issue + add remote mining apparatus to epm (#8585) Fix generated address returned by Substrate RPC runtime call (#8504) ...

Bump memory-db to pick up #8606 and paritytech/trie#221 Additionally, polkavm needs to be bumped to get rid of to get rid of https://github.com/paritytech/polkadot-sdk/actions/runs/15180236627/job/42688141374#step:5:1869 --------- Signed-off-by: Alexandru Gheorghe <[email protected]>

Discovered while profiling #6131 (comment) with the benchmark #8069 that when running in validation a big chunk of the time is spent inserting and retrieving data from the BTreeMap/BTreeSet. By switching to hashbrown HashMap/HashSet in validation TrieCache and TrieRecorder and the memory-db paritytech/trie#221 read costs improve with around ~40% and write with about ~20% --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Bump memory-db to pick up #8606 and paritytech/trie#221 Additionally, polkavm needs to be bumped to get rid of to get rid of https://github.com/paritytech/polkadot-sdk/actions/runs/15180236627/job/42688141374#step:5:1869 --------- Signed-off-by: Alexandru Gheorghe <[email protected]>

…lidation context (#9127) #8606 paritytech/trie#221 replaced the usage of BTreeMap with HashMaps in validation context. The keys are already derived with a cryptographic hash function from user data, so users should not be able to manipulate it. To be on safe side this PR also modifies the TrieCache, TrieRecorder and MemoryDB to use a hasher that on top of the default generated randomness also adds randomness generated from the hash of the relaychain and that of the parachain blocks, which is not something users can control or guess ahead of time. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…lidation context (#9127) #8606 paritytech/trie#221 replaced the usage of BTreeMap with HashMaps in validation context. The keys are already derived with a cryptographic hash function from user data, so users should not be able to manipulate it. To be on safe side this PR also modifies the TrieCache, TrieRecorder and MemoryDB to use a hasher that on top of the default generated randomness also adds randomness generated from the hash of the relaychain and that of the parachain blocks, which is not something users can control or guess ahead of time. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit 7058819)

Discovered while profiling #6131 (comment) with the benchmark #8069 that when running in validation a big chunk of the time is spent inserting and retrieving data from the BTreeMap/BTreeSet. By switching to hashbrown HashMap/HashSet in validation TrieCache and TrieRecorder and the memory-db paritytech/trie#221 read costs improve with around ~40% and write with about ~20% --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Bump memory-db to pick up #8606 and paritytech/trie#221 Additionally, polkavm needs to be bumped to get rid of to get rid of https://github.com/paritytech/polkadot-sdk/actions/runs/15180236627/job/42688141374#step:5:1869 --------- Signed-off-by: Alexandru Gheorghe <[email protected]>

…lidation context (#9127) #8606 paritytech/trie#221 replaced the usage of BTreeMap with HashMaps in validation context. The keys are already derived with a cryptographic hash function from user data, so users should not be able to manipulate it. To be on safe side this PR also modifies the TrieCache, TrieRecorder and MemoryDB to use a hasher that on top of the default generated randomness also adds randomness generated from the hash of the relaychain and that of the parachain blocks, which is not something users can control or guess ahead of time. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Use hashbrown hashmap/hashset in validation context

fda18c9

Signed-off-by: Alexandru Gheorghe <[email protected]>

This was referenced May 22, 2025

Replace BTreeMap usage with hashbrown hashmap paritytech/trie#221

Merged

Bump memory-db #8607

Merged

alexggh requested review from AndreiEres, athei, bkchr and skunert May 22, 2025 07:13

bkchr approved these changes May 22, 2025

View reviewed changes

bkchr added the T9-cumulus This PR/Issue is related to cumulus. label May 22, 2025

skunert approved these changes May 22, 2025

View reviewed changes

athei approved these changes May 22, 2025

View reviewed changes

lexnv approved these changes May 22, 2025

View reviewed changes

AndreiEres approved these changes May 22, 2025

View reviewed changes

Merge remote-tracking branch 'origin/master' into alexggh/switch_to_u…

0b8e4c3

…sing_hashbrown

AndreiEres linked an issue May 23, 2025 that may be closed by this pull request

Blocks run with high TrieCache hits seems to be 20% faster to produce than validate #7868

Closed

github-actions bot and others added 4 commits May 23, 2025 13:06

Update from github-actions[bot] running command 'prdoc --audience nod…

b0ce4df

…e_dev --bump patch'

Make taplo happy

3b2ea5e

Signed-off-by: Alexandru Gheorghe <[email protected]>

Update pr_8606.prdoc

4caa8d7

Merge branch 'master' into alexggh/switch_to_using_hashbrown

18b8c6c

alexggh enabled auto-merge May 23, 2025 13:49

alexggh added this pull request to the merge queue May 23, 2025

Merged via the queue into master with commit 85f5dc7 May 23, 2025
182 checks passed

alexggh deleted the alexggh/switch_to_using_hashbrown branch May 23, 2025 15:29

ordian mentioned this pull request May 27, 2025

Introduction of Approval Slashes [Disabling Strategy Stage 4] #6827

Merged

9 tasks

alexggh mentioned this pull request Jul 8, 2025

add block hashes to the randomness used by hashmaps and friends in validation context #9127

Merged

github-actions bot mentioned this pull request Oct 24, 2025

Update polkadot-sdk from stable2503 to stable2506 moondance-labs/tanssi#1340

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use hashbrown hashmap/hashset in validation context #8606

Use hashbrown hashmap/hashset in validation context #8606

Uh oh!

alexggh commented May 22, 2025 •

edited

Loading

Uh oh!

skunert left a comment

Uh oh!

athei left a comment

Uh oh!

alexggh commented May 23, 2025

Uh oh!

Uh oh!

burdges commented May 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Use hashbrown hashmap/hashset in validation context #8606

Use hashbrown hashmap/hashset in validation context #8606

Uh oh!

Conversation

alexggh commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skunert left a comment

Choose a reason for hiding this comment

Uh oh!

athei left a comment

Choose a reason for hiding this comment

Uh oh!

alexggh commented May 23, 2025

Uh oh!

Uh oh!

burdges commented May 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

alexggh commented May 22, 2025 •

edited

Loading