Skip to content

Conversation

@jpivarski
Copy link
Member

This was a dumb mistake, pointed out by Andrew Wightman (Notre Dame) with a file of 61944 histograms.

Since names are not unique identifiers for objects in TDirectories, the uproot.TDirectory.__getitem__ was iterating through the list, looking for matches. If you do that n times, the time complexity is O(n²).

However, names are almost unique identifiers for objects in TDirectories, so I added a uproot.TDirectory._keys_lookup, which is a hashmap from names to lists of matching indexes in uproot.TDirectory._keys. For a given name, the number of items to search through is much shorter, usually 1.

This reduces the time needed to read the 61944 from 206 seconds to 54 seconds. Most importantly, it's flat: both the first and the last 1000 histograms take 0.85 seconds, whereas before it was 0.85 seconds for the first 1000 histograms and 6.2 seconds for the last 1000 histograms. The time complexity to read n histograms is O(n).

@jpivarski
Copy link
Member Author

Note to self: remember to cherry-pick the merge commit from main to main-v4.

@jpivarski jpivarski merged commit 809cf8e into main Jul 2, 2022
@jpivarski jpivarski deleted the jpivarski/iterate-over-objects-in-TDirectory-in-linear-time branch July 2, 2022 18:18
jpivarski added a commit that referenced this pull request Jul 2, 2022
* Iterate over objects in TDirectory in linear time.

* Remove the debug_counter.

(cherry picked from commit 809cf8e)
jpivarski added a commit that referenced this pull request Jul 2, 2022
* Iterate over objects in TDirectory in linear time.

* Remove the debug_counter.

(cherry picked from commit 809cf8e)
Moelf pushed a commit to Moelf/uproot5 that referenced this pull request Aug 1, 2022
* Iterate over objects in TDirectory in linear time.

* Remove the debug_counter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants