Skip to content

Conversation

@alexggh
Copy link
Contributor

@alexggh alexggh commented May 22, 2025

Discovered while profiling #6131 (comment) with the benchmark #8069 that when running in validation a big chunk of the time is spent inserting and retrieving data from the BTreeMap/BTreeSet.

By switching to hashbrown HashMap/HashSet in validation TrieCache and TrieRecorder and the memory-db paritytech/trie#221 read costs improve with around ~40% and write with about ~20%

@bkchr bkchr added the T9-cumulus This PR/Issue is related to cumulus. label May 22, 2025
Copy link
Contributor

@skunert skunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

@athei athei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be safe since the keys are already derived via a cryptographic hash function from user data.

@alexggh
Copy link
Contributor Author

alexggh commented May 23, 2025

/cmd prdoc --audience node_dev --bump patch

@alexggh alexggh enabled auto-merge May 23, 2025 13:49
@alexggh alexggh added this pull request to the merge queue May 23, 2025
Merged via the queue into master with commit 85f5dc7 May 23, 2025
182 checks passed
@alexggh alexggh deleted the alexggh/switch_to_using_hashbrown branch May 23, 2025 15:29
@burdges
Copy link

burdges commented May 23, 2025

If code runs in native, then HashMap uses secret system randomness, which makes it secure. We typically use BTreeMap instead of HashMap for user controlled keys inside the runtime because there is no system randomness, so adversaries could easily find hash collisions that cause slow probing.

A cryprtographic hash function only provides collision resistance at cryptographic sizes, but if you only have say 2^32 elements in the HashMap, then one can easily find thousands of blake2_128 collisions of that size, which then causes slowdows. If the adversary needs real accounts, then the elliptic curve scalar multiplications slows them down slightly, but does not prevent the attack, and if dust accounts suffice then they can avoid this by using a fast hash-to-curve.

cc @jakoblell

Alternative 1.

We could analyize this attach if the block hash were treated as "system" randomness for the HashMap. We do not have fiat shamir of course but then the blake2_128 gives us something: If we didn't have the blake2_128 then the adversary could compute the blockchain inside some contract and create dust accounts that collided. I think our blake2_128 means the adversary cannot run that improved attack without some auxiliary input external to the block, not allowed in our model.

If this works, then the adverary can only bruit force the whole block hash to pick a somewhat concentrated distribution, which likely reduces the size of attackable HashMaps, maybe or maaybe not enough.

Alternative 2.

As a "blockchain" solution, we could develop a cuckoo table say called PerfectMap, which mostly works like HashMap, except it always runs "perfectly" on the validator or else it panics.

At the collator, PerfectMap records its creation order and instantiates its hasher using local system randomness r, but all keys inserts and deletions get added to a replay log, which only runs on collators. Anytime an insertion fails PerfectMap simply resamples fresh system randomness r, enlarges itself if fuller than say 80%, and then replays the entire block creation process so far its replay log, possibly triggering even more restarts. PerfectMap::Drop logs the final size and final r in PerfectMap creation order in a perfectmap_log the PoV data.

At the validator, PerfectMap pulls the next size and r from perfectmap_log in the PoV data, allocates itself from this size, and instantiates its hasher using r. Any insertion failures simply panic, makign the block invalid. Any runtime code other than PerfectMap that reads the perfectmap_log could easily cause panics, so contracts must not have access to perfectmap_log.

A cuckoo table seems easier to analyize since they already have capped probing. Also they save some memory. You could cap the probing in a HashMap too, doing so costs some memory.

ordian added a commit that referenced this pull request May 27, 2025
* master: (99 commits)
  Snowbridge: Remove asset location check for compatibility (#8473)
  add poke_deposit extrinsic to pallet-bounties (#8382)
  litep2p/peerset: Reject non-reserved peers in the reserved-only mode (#8650)
  Charge deposit based on key length (#8648)
  [pallet-revive] make subscription task panic on error (#8587)
  tx/metrics: Add metrics for the RPC v2 `transactionWatch_v1_submitAndWatch` (#8345)
  Bridges: Fix - Improve try-state for pallet-xcm-bridge-hub (#8615)
  Introduce CreateBare, deprecated CreateInherent (#7597)
  Use hashbrown hashmap/hashset in validation context (#8606)
  ci: rm gitlab config (#8622)
  🔪 flaky and Zombienet tests (#8600)
  cumulus: adjust unincluded segment size metric buckets (#8617)
  Benchmark storage access on block validation (#8069)
  Revert 7934 es/remove tj changes (#8611)
  collator-protocol: add more collation observability (#8230)
  `fatxpool`: add fallback for ready at light (#8533)
  txpool: fix tx removal from unlocks set (#8500)
  XCMP weight metering: account for the MQ page position (#8344)
  fix epmb solution duplicate issue + add remote mining apparatus to epm (#8585)
  Fix generated address returned by Substrate RPC runtime call (#8504)
  ...
github-merge-queue bot pushed a commit that referenced this pull request May 27, 2025
Bump memory-db to pick up
#8606 and
paritytech/trie#221

Additionally, polkavm needs to be bumped to get rid of to get rid of
https://github.com/paritytech/polkadot-sdk/actions/runs/15180236627/job/42688141374#step:5:1869

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
pgherveou pushed a commit that referenced this pull request Jun 11, 2025
Discovered while profiling
#6131 (comment)
with the benchmark #8069
that when running in validation a big chunk of the time is spent
inserting and retrieving data from the BTreeMap/BTreeSet.

By switching to hashbrown HashMap/HashSet in validation TrieCache and
TrieRecorder and the memory-db
paritytech/trie#221 read costs improve with
around ~40% and write with about ~20%

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
pgherveou pushed a commit that referenced this pull request Jun 11, 2025
Bump memory-db to pick up
#8606 and
paritytech/trie#221

Additionally, polkavm needs to be bumped to get rid of to get rid of
https://github.com/paritytech/polkadot-sdk/actions/runs/15180236627/job/42688141374#step:5:1869

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Jul 11, 2025
…lidation context (#9127)

#8606
paritytech/trie#221 replaced the usage of
BTreeMap with HashMaps in validation context. The keys are already
derived with a cryptographic hash function from user data, so users
should not be able to manipulate it.

To be on safe side this PR also modifies the TrieCache, TrieRecorder and
MemoryDB to use a hasher that on top of the default generated randomness
also adds randomness generated from the hash of the relaychain and that
of the parachain blocks, which is not something users can control or
guess ahead of time.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
paritytech-release-backport-bot bot pushed a commit that referenced this pull request Jul 11, 2025
…lidation context (#9127)

#8606
paritytech/trie#221 replaced the usage of
BTreeMap with HashMaps in validation context. The keys are already
derived with a cryptographic hash function from user data, so users
should not be able to manipulate it.

To be on safe side this PR also modifies the TrieCache, TrieRecorder and
MemoryDB to use a hasher that on top of the default generated randomness
also adds randomness generated from the hash of the relaychain and that
of the parachain blocks, which is not something users can control or
guess ahead of time.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 7058819)
paritytech-release-backport-bot bot pushed a commit that referenced this pull request Jul 17, 2025
…lidation context (#9127)

#8606
paritytech/trie#221 replaced the usage of
BTreeMap with HashMaps in validation context. The keys are already
derived with a cryptographic hash function from user data, so users
should not be able to manipulate it.

To be on safe side this PR also modifies the TrieCache, TrieRecorder and
MemoryDB to use a hasher that on top of the default generated randomness
also adds randomness generated from the hash of the relaychain and that
of the parachain blocks, which is not something users can control or
guess ahead of time.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 7058819)
alvicsam pushed a commit that referenced this pull request Oct 17, 2025
Discovered while profiling
#6131 (comment)
with the benchmark #8069
that when running in validation a big chunk of the time is spent
inserting and retrieving data from the BTreeMap/BTreeSet.

By switching to hashbrown HashMap/HashSet in validation TrieCache and
TrieRecorder and the memory-db
paritytech/trie#221 read costs improve with
around ~40% and write with about ~20%

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
alvicsam pushed a commit that referenced this pull request Oct 17, 2025
Bump memory-db to pick up
#8606 and
paritytech/trie#221

Additionally, polkavm needs to be bumped to get rid of to get rid of
https://github.com/paritytech/polkadot-sdk/actions/runs/15180236627/job/42688141374#step:5:1869

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
alvicsam pushed a commit that referenced this pull request Oct 17, 2025
…lidation context (#9127)

#8606
paritytech/trie#221 replaced the usage of
BTreeMap with HashMaps in validation context. The keys are already
derived with a cryptographic hash function from user data, so users
should not be able to manipulate it.

To be on safe side this PR also modifies the TrieCache, TrieRecorder and
MemoryDB to use a hasher that on top of the default generated randomness
also adds randomness generated from the hash of the relaychain and that
of the parachain blocks, which is not something users can control or
guess ahead of time.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T9-cumulus This PR/Issue is related to cumulus.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Blocks run with high TrieCache hits seems to be 20% faster to produce than validate

8 participants