-
Notifications
You must be signed in to change notification settings - Fork 84
Description
Our users observed a significant performance regression (10x and more) and it turns out to be introduced somewhere between 4.1.2 and 4.1.3 (release on 29th September 2021). The awkward
version is the latest 1.8.0
.
I went through the release notes in https://github.com/scikit-hep/uproot4/releases/tag/4.1.3 and I first reverted #441 which also needed the objectarray1d
function from uproot._util
(see https://github.com/scikit-hep/uproot4/pull/442/files) but it did not help.
I also went through the other PRs but they seemed unrelated, so I started a git bisect
with
Good: 382bb62f47a05f6111d3787e3235f9e56d9e113e
(4.1.2)
Bad: 03cb08ccb1f47ce0b082117f41a112af1ab403ee
(4.1.3)
and identified the 007b100 from @nsmith- which introduced the performance regression:
░ tamasgal@silentbox-(2):uproot4 (git)-[4.1.3~8|bisect]- py-system
░ 13:30:56 > git bisect good
007b10014003a74d0e89e0166e21a6218de0d445 is the first bad commit
commit 007b10014003a74d0e89e0166e21a6218de0d445
Author: Nicholas Smith <[email protected]>
Date: Mon Sep 20 14:09:59 2021 -0500
Implement regular array support for TClonesArray (#442)
* Implement regular array support for TClonesArray
* Add a small test
* Fix issue with incidentally regular TRefArray lists
* Cannot trust dict order
src/uproot/_util.py | 13 +++++++++
src/uproot/containers.py | 45 ++++++++++++++++++++++---------
src/uproot/interpretation/identify.py | 4 +--
src/uproot/interpretation/library.py | 20 +++++++++++---
src/uproot/interpretation/numerical.py | 11 ++++++++
src/uproot/interpretation/objects.py | 48 +++++++++++++++++++++++++++------
tests/test_0442-regular-TClonesArray.py | 40 +++++++++++++++++++++++++++
7 files changed, 155 insertions(+), 26 deletions(-)
create mode 100644 tests/test_0442-regular-TClonesArray.py
I investigated our codes and found the exact call which causes the problem: it's indeed the reading of doubly nested arrays.
With uproot 4.1.2
, the following code is almost instantaneous whereas with uproot 4.1.3
it takes about 1.5 minutes (for a 1.5GB file):
>>> import uproot
>>> f = uproot.open("doubly_nested_regression.root")
>>> f["E/Evt/trks/trks.rec_stages"].array()
<Array [[[1, 2, 5, 3, 5, 4], ... 1], [1], [1]]] type='158262 * var * var * int32'>
Here is the file http://131.188.161.12:30002/doubly_nested_regression.root
@jpivarski you probably remember that you introduced some special hack to squeeze a lot out of doubly-nested structures, see scikit-hep/awkward#968 and https://github.com/scikit-hep/awkward-1.0/blob/main/src/awkward/_connect/_uproot.py#L35 so my first thought was that these became ineffective with the changes introduced in 007b100 but I am just speculating ; )