- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.1k
 
Use hashbrown hashmap/hashset in validation context #8606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alexandru Gheorghe <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be safe since the keys are already derived via a cryptographic hash function from user data.
| 
           /cmd prdoc --audience node_dev --bump patch  | 
    
…e_dev --bump patch'
Signed-off-by: Alexandru Gheorghe <[email protected]>
| 
           If code runs in native, then HashMap uses secret system randomness, which makes it secure. We typically use BTreeMap instead of HashMap for user controlled keys inside the runtime because there is no system randomness, so adversaries could easily find hash collisions that cause slow probing. A cryprtographic hash function only provides collision resistance at cryptographic sizes, but if you only have say 2^32 elements in the HashMap, then one can easily find thousands of blake2_128 collisions of that size, which then causes slowdows. If the adversary needs real accounts, then the elliptic curve scalar multiplications slows them down slightly, but does not prevent the attack, and if dust accounts suffice then they can avoid this by using a fast hash-to-curve. cc @jakoblell Alternative 1. We could analyize this attach if the block hash were treated as "system" randomness for the HashMap. We do not have fiat shamir of course but then the blake2_128 gives us something: If we didn't have the blake2_128 then the adversary could compute the blockchain inside some contract and create dust accounts that collided. I think our blake2_128 means the adversary cannot run that improved attack without some auxiliary input external to the block, not allowed in our model. If this works, then the adverary can only bruit force the whole block hash to pick a somewhat concentrated distribution, which likely reduces the size of attackable HashMaps, maybe or maaybe not enough. Alternative 2. As a "blockchain" solution, we could develop a cuckoo table say called PerfectMap, which mostly works like HashMap, except it always runs "perfectly" on the validator or else it panics. At the collator, PerfectMap records its creation order and instantiates its hasher using local system randomness  At the validator, PerfectMap pulls the next size and  A cuckoo table seems easier to analyize since they already have capped probing. Also they save some memory. You could cap the probing in a HashMap too, doing so costs some memory.  | 
    
* master: (99 commits) Snowbridge: Remove asset location check for compatibility (#8473) add poke_deposit extrinsic to pallet-bounties (#8382) litep2p/peerset: Reject non-reserved peers in the reserved-only mode (#8650) Charge deposit based on key length (#8648) [pallet-revive] make subscription task panic on error (#8587) tx/metrics: Add metrics for the RPC v2 `transactionWatch_v1_submitAndWatch` (#8345) Bridges: Fix - Improve try-state for pallet-xcm-bridge-hub (#8615) Introduce CreateBare, deprecated CreateInherent (#7597) Use hashbrown hashmap/hashset in validation context (#8606) ci: rm gitlab config (#8622) 🔪 flaky and Zombienet tests (#8600) cumulus: adjust unincluded segment size metric buckets (#8617) Benchmark storage access on block validation (#8069) Revert 7934 es/remove tj changes (#8611) collator-protocol: add more collation observability (#8230) `fatxpool`: add fallback for ready at light (#8533) txpool: fix tx removal from unlocks set (#8500) XCMP weight metering: account for the MQ page position (#8344) fix epmb solution duplicate issue + add remote mining apparatus to epm (#8585) Fix generated address returned by Substrate RPC runtime call (#8504) ...
Bump memory-db to pick up #8606 and paritytech/trie#221 Additionally, polkavm needs to be bumped to get rid of to get rid of https://github.com/paritytech/polkadot-sdk/actions/runs/15180236627/job/42688141374#step:5:1869 --------- Signed-off-by: Alexandru Gheorghe <[email protected]>
Discovered while profiling #6131 (comment) with the benchmark #8069 that when running in validation a big chunk of the time is spent inserting and retrieving data from the BTreeMap/BTreeSet. By switching to hashbrown HashMap/HashSet in validation TrieCache and TrieRecorder and the memory-db paritytech/trie#221 read costs improve with around ~40% and write with about ~20% --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Bump memory-db to pick up #8606 and paritytech/trie#221 Additionally, polkavm needs to be bumped to get rid of to get rid of https://github.com/paritytech/polkadot-sdk/actions/runs/15180236627/job/42688141374#step:5:1869 --------- Signed-off-by: Alexandru Gheorghe <[email protected]>
…lidation context (#9127) #8606 paritytech/trie#221 replaced the usage of BTreeMap with HashMaps in validation context. The keys are already derived with a cryptographic hash function from user data, so users should not be able to manipulate it. To be on safe side this PR also modifies the TrieCache, TrieRecorder and MemoryDB to use a hasher that on top of the default generated randomness also adds randomness generated from the hash of the relaychain and that of the parachain blocks, which is not something users can control or guess ahead of time. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…lidation context (#9127) #8606 paritytech/trie#221 replaced the usage of BTreeMap with HashMaps in validation context. The keys are already derived with a cryptographic hash function from user data, so users should not be able to manipulate it. To be on safe side this PR also modifies the TrieCache, TrieRecorder and MemoryDB to use a hasher that on top of the default generated randomness also adds randomness generated from the hash of the relaychain and that of the parachain blocks, which is not something users can control or guess ahead of time. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit 7058819)
…lidation context (#9127) #8606 paritytech/trie#221 replaced the usage of BTreeMap with HashMaps in validation context. The keys are already derived with a cryptographic hash function from user data, so users should not be able to manipulate it. To be on safe side this PR also modifies the TrieCache, TrieRecorder and MemoryDB to use a hasher that on top of the default generated randomness also adds randomness generated from the hash of the relaychain and that of the parachain blocks, which is not something users can control or guess ahead of time. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit 7058819)
Discovered while profiling #6131 (comment) with the benchmark #8069 that when running in validation a big chunk of the time is spent inserting and retrieving data from the BTreeMap/BTreeSet. By switching to hashbrown HashMap/HashSet in validation TrieCache and TrieRecorder and the memory-db paritytech/trie#221 read costs improve with around ~40% and write with about ~20% --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Bump memory-db to pick up #8606 and paritytech/trie#221 Additionally, polkavm needs to be bumped to get rid of to get rid of https://github.com/paritytech/polkadot-sdk/actions/runs/15180236627/job/42688141374#step:5:1869 --------- Signed-off-by: Alexandru Gheorghe <[email protected]>
…lidation context (#9127) #8606 paritytech/trie#221 replaced the usage of BTreeMap with HashMaps in validation context. The keys are already derived with a cryptographic hash function from user data, so users should not be able to manipulate it. To be on safe side this PR also modifies the TrieCache, TrieRecorder and MemoryDB to use a hasher that on top of the default generated randomness also adds randomness generated from the hash of the relaychain and that of the parachain blocks, which is not something users can control or guess ahead of time. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Discovered while profiling #6131 (comment) with the benchmark #8069 that when running in validation a big chunk of the time is spent inserting and retrieving data from the BTreeMap/BTreeSet.
By switching to hashbrown HashMap/HashSet in validation TrieCache and TrieRecorder and the memory-db paritytech/trie#221 read costs improve with around ~40% and write with about ~20%