Skip to content

Very slow equality checks for certain sentences #1

@blyxxyz

Description

@blyxxyz

Sentences where many nodes have multiple parents can be slow to compare to each other.

This happens when two sentences are identical or almost identical, but exist as separate objects. Each equality check naively compares the children of nodes, but that means nodes can be checked a lot of times.

For example, using examples/socialchoice.py:

$ python3 -m timeit -s 'from socialchoice import lin; s = lin(range(10)); t = lin(range(10))' 's == t'
1 loop, best of 5: 7.47 sec per loop

There's no straightforward way to solve this, because the equality checks use frozenset equality. The problem is solved in other methods with a temporary cache around a function defined inside the method, but comparing frozensets just invokes the __eq__ methods of the children again.

Two potential solutions:

  • Enable a global cache when __eq__ is called, and clear the cache when the call to __eq__ that enabled the cache is finished. This probably interacts badly with threads, but only by potentially keeping objects in memory longer than necessary or discarding the cache too early, without outright incorrect behavior. It also adds at least a little overhead to typical __eq__ calls.
  • Stop using frozenset equality, and implement all of the equality checks from scratch. This could make performance much worse for typical cases.

The problem only occurs when the objects are constructed separately. If an object is compared to itself Python is smart enough to notice and skip the lengthier check.

Solving it might not be worth it, unless a concrete case where it occurs pops up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions