Skip to content

Conversation

echoix
Copy link

@echoix echoix commented Apr 21, 2024

I had an issue where profile my project using mkcheck generated a graph json file of about 40 MB, and even the simplest of the python tool operations, like "list" would take over 8min 27s to complete.

I profiled the complete python invocation using scalene, and found out that it wasn't a CPU-only bottleneck, but a memory one in parse_graph(). The inputs and outputs are sets, and are filled in through the loop with the | (union) operator, which returns a new set with elements from the set and all others.. The complete object was copyied over to a new one at each iteration. This is why the scalene profiling showed a peak memory allocation of 149GB only for the line

inputs = inputs | proc_in

(It probably wasn't used at once, but the multiple assignments throughout the loop iterations).

The solution for this, is to use the update operator, which updates the set, adding elements from all others.. The same object is used and the set contains the new elements. Also, since the proc_in variable wasn't used anywhere else, I inlined the call, removing an extra set instantiation.

After this, I made a small change that doesn't help at much: using json.load(f) instead of json.loads(f.read()). The function json.loads() parses a string (that comes from f.read()), while json.load() loads json from a file.

With these changes, I passed from 8 min 27s to 26 seconds, which is way more managable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant