Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 24, 2025

📄 124% (1.24x) speedup for tarjan in stanza/models/common/chuliu_edmonds.py

⏱️ Runtime : 70.0 milliseconds 31.3 milliseconds (best of 167 runs)

📝 Explanation and details

The optimization achieves a 123% speedup by replacing expensive repeated np.where() calls with a precomputed dependency lookup map.

Key optimization:

  • Precomputed dependency map: Instead of calling np.where(np.equal(tree, i))[0] for each node during the DFS traversal (which scans the entire array each time), the optimized version builds a dependents_map once at the beginning where dependents_map[i] contains all nodes that have i as their head.

Why this is faster:

  • The original code performed an O(N) scan of the entire tree array for every node visited during DFS traversal
  • The line profiler shows strong_connect(i) took 95% of runtime in the original (132ms out of 140ms total)
  • The optimized version reduces this to 80.9% of a much smaller total runtime (77ms out of 96ms total)
  • This transforms the dependency lookup from O(N) per node to O(1) amortized lookup

Performance characteristics:

  • Most effective for larger graphs where the O(N²) behavior of repeated scans becomes dominant
  • Particularly beneficial for dense graphs or graphs with many nodes having the same head
  • The precomputation overhead (visible in profiler as ~11% of total time) is quickly amortized as graph size increases
  • Test cases show consistent speedups across all scenarios, from simple cycles to large-scale graphs with 1000+ nodes

The optimization maintains identical behavior and output while dramatically reducing the algorithmic complexity of the dependency finding step.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 4 Passed
🌀 Generated Regression Tests 48 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest
from stanza.models.common.chuliu_edmonds import tarjan

# unit tests

# ----------- BASIC TEST CASES ------------

def test_simple_tree_no_cycle():
    # Single root, no cycles
    tree = np.array([0, 0, 0])
    codeflash_output = tarjan(tree); cycles = codeflash_output

def test_simple_tree_with_branching():
    # Tree with branching, no cycles
    tree = np.array([0, 1, 1, 2, 2])
    codeflash_output = tarjan(tree); cycles = codeflash_output

def test_single_cycle():
    # Simple cycle: 1->2, 2->3, 3->1
    tree = np.array([0, 3, 1, 2])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    cycle = cycles[0]
    expected = np.array([False, True, True, True])

def test_multiple_disjoint_cycles():
    # Two cycles: 1-2-1 and 3-4-3
    tree = np.array([0,2,1,4,3])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    masks = [np.array([False,True,True,False,False]), np.array([False,False,False,True,True])]

def test_cycle_with_tail():
    # 1->2->3->1, 4->3 (tail into cycle)
    tree = np.array([0,3,1,2,3])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    cycle = cycles[0]
    expected = np.array([False,True,True,True,False])

def test_no_cycle_with_self_loops():
    # Self-loop at root, rest are tree
    tree = np.array([0,0,1,2])
    codeflash_output = tarjan(tree); cycles = codeflash_output

def test_cycle_with_self_loop():
    # Node 1 points to itself, rest are tree
    tree = np.array([0,1,1,2])
    codeflash_output = tarjan(tree); cycles = codeflash_output

def test_cycle_with_multiple_entries():
    # 1->2->3->1, 4->2, 5->3 (multiple entries to cycle)
    tree = np.array([0,3,1,2,2,3])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    expected = np.array([False,True,True,True,False,False])

def test_root_cycle():
    # Root participates in a cycle: 0->1, 1->0
    tree = np.array([1,0])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    expected = np.array([True,True])

# ----------- EDGE TEST CASES ------------

def test_empty_graph():
    # Empty graph
    tree = np.array([])
    codeflash_output = tarjan(tree); cycles = codeflash_output

def test_single_node_root():
    # One node, root
    tree = np.array([0])
    codeflash_output = tarjan(tree); cycles = codeflash_output

def test_single_node_self_loop():
    # One node, points to itself
    tree = np.array([0])
    codeflash_output = tarjan(tree); cycles = codeflash_output

def test_two_node_cycle():
    # Two nodes, each point to each other
    tree = np.array([1,0])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    expected = np.array([True,True])

def test_disconnected_components():
    # Two disconnected trees
    tree = np.array([0,0,3,3,3])
    codeflash_output = tarjan(tree); cycles = codeflash_output

def test_all_nodes_cycle():
    # All nodes in a single cycle
    n = 5
    tree = np.array([(i+1)%n for i in range(n)])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    expected = np.ones(n, dtype=bool)

def test_cycle_with_isolated_node():
    # Cycle plus isolated node
    tree = np.array([0,2,1,3])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    expected = np.array([False,True,True,False])

def test_cycle_with_self_loop_and_other_cycle():
    # Node 1 self-loop, 2-3-4-2 cycle
    tree = np.array([0,1,4,2,3])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    expected = np.array([False,False,True,True,True])

def test_cycle_with_duplicate_heads():
    # Multiple nodes point to same head, forming a cycle
    tree = np.array([0,2,1,2])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    expected = np.array([False,True,True,False])

# ----------- LARGE SCALE TEST CASES ------------

def test_large_tree_no_cycle():
    # Large tree, no cycles
    n = 1000
    tree = np.zeros(n, dtype=int)
    tree[0] = 0
    for i in range(1, n):
        tree[i] = i-1
    codeflash_output = tarjan(tree); cycles = codeflash_output

def test_large_single_cycle():
    # Large cycle with all nodes
    n = 1000
    tree = np.array([(i+1)%n for i in range(n)])
    codeflash_output = tarjan(tree); cycles = codeflash_output
    expected = np.ones(n, dtype=bool)

def test_large_forest_with_small_cycles():
    # Large forest, with a few small cycles
    n = 1000
    tree = np.zeros(n, dtype=int)
    tree[0] = 0
    for i in range(1, n):
        tree[i] = i-1
    # Add three small cycles at end
    tree[n-3] = n-2
    tree[n-2] = n-1
    tree[n-1] = n-3
    codeflash_output = tarjan(tree); cycles = codeflash_output
    expected = np.zeros(n, dtype=bool)
    expected[n-3:n] = True

def test_large_multiple_disjoint_cycles():
    # Large graph with multiple disjoint cycles
    n = 1000
    tree = np.zeros(n, dtype=int)
    # 10 cycles of 10 nodes each
    for c in range(10):
        start = c*10
        for i in range(10):
            tree[start+i] = start+(i+1)%10
    # Rest are trees
    for i in range(100, n):
        tree[i] = i-1
    codeflash_output = tarjan(tree); cycles = codeflash_output
    for c in range(10):
        start = c*10
        expected = np.zeros(n, dtype=bool)
        expected[start:start+10] = True

def test_large_graph_with_no_cycles():
    # Large disconnected graph, no cycles
    n = 1000
    tree = np.zeros(n, dtype=int)
    for i in range(n):
        tree[i] = i//100 * 100  # Each 100 nodes point to their own root
    codeflash_output = tarjan(tree); cycles = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import numpy as np
# imports
import pytest
from stanza.models.common.chuliu_edmonds import tarjan

# unit tests

# -------------------- BASIC TEST CASES --------------------

def test_empty_graph():
    # Test with an empty graph (no nodes)
    tree = np.array([], dtype=int)
    codeflash_output = tarjan(tree); result = codeflash_output

def test_single_node_no_cycle():
    # Test with a single node pointing to itself (root)
    tree = np.array([0])
    codeflash_output = tarjan(tree); result = codeflash_output

def test_two_nodes_no_cycle():
    # Node 0 is root, node 1 points to root
    tree = np.array([0, 0])
    codeflash_output = tarjan(tree); result = codeflash_output

def test_two_nodes_with_cycle():
    # Node 0 points to 1, node 1 points to 0 (cycle)
    tree = np.array([1, 0])
    codeflash_output = tarjan(tree); result = codeflash_output

def test_three_node_tree_no_cycle():
    # Standard tree: 0 is root, 1 and 2 point to 0
    tree = np.array([0, 0, 0])
    codeflash_output = tarjan(tree); result = codeflash_output

def test_three_node_cycle():
    # 0 -> 1, 1 -> 2, 2 -> 0
    tree = np.array([1, 2, 0])
    codeflash_output = tarjan(tree); result = codeflash_output

def test_multiple_disjoint_cycles():
    # 0->1, 1->0 (cycle), 2->3, 3->2 (cycle)
    tree = np.array([1, 0, 3, 2])
    codeflash_output = tarjan(tree); result = codeflash_output
    # Each cycle should be a pair of True values
    cycles = [tuple(cycle) for cycle in result]

def test_no_cycles_complex_tree():
    # Example from docstring: [0 4 4 4 0]
    tree = np.array([0, 4, 4, 4, 0])
    codeflash_output = tarjan(tree); result = codeflash_output

def test_cycle_in_middle_of_tree():
    # 0->0 (root), 1->2, 2->3, 3->1 (cycle), 4->0
    tree = np.array([0, 2, 3, 1, 0])
    codeflash_output = tarjan(tree); result = codeflash_output
    expected = np.array([False, True, True, True, False])

# -------------------- EDGE TEST CASES --------------------

def test_cycle_self_loop():
    # Node 0 points to itself, which is not a cycle (only one node)
    tree = np.array([0])
    codeflash_output = tarjan(tree); result = codeflash_output

def test_two_node_self_loops():
    # Both nodes point to themselves, no cycle
    tree = np.array([0, 1])
    codeflash_output = tarjan(tree); result = codeflash_output

def test_cycle_with_self_loop_and_others():
    # 0->0 (self), 1->2, 2->1 (cycle)
    tree = np.array([0, 2, 1])
    codeflash_output = tarjan(tree); result = codeflash_output
    expected = np.array([False, True, True])

def test_all_nodes_cycle():
    # All nodes in one cycle: 0->1, 1->2, 2->3, 3->0
    tree = np.array([1, 2, 3, 0])
    codeflash_output = tarjan(tree); result = codeflash_output

def test_multiple_nested_cycles():
    # 0->1, 1->2, 2->0 (cycle 1), 3->4, 4->3 (cycle 2)
    tree = np.array([1, 2, 0, 4, 3])
    codeflash_output = tarjan(tree); result = codeflash_output
    cycles = [tuple(cycle) for cycle in result]

def test_cycle_with_isolated_nodes():
    # 0->0 (root), 1->2, 2->1 (cycle), 3->3 (isolated self-loop)
    tree = np.array([0, 2, 1, 3])
    codeflash_output = tarjan(tree); result = codeflash_output
    expected = np.array([False, True, True, False])

def test_no_cycles_with_multiple_roots():
    # 0->0, 1->1, 2->2 (all self-loops, no cycles)
    tree = np.array([0, 1, 2])
    codeflash_output = tarjan(tree); result = codeflash_output

def test_cycle_with_non_consecutive_nodes():
    # 0->2, 1->0, 2->1 (cycle among 0,1,2), 3->3 (no cycle)
    tree = np.array([2, 0, 1, 3])
    codeflash_output = tarjan(tree); result = codeflash_output
    expected = np.array([True, True, True, False])

def test_cycle_with_unreachable_nodes():
    # 0->1, 1->2, 2->0 (cycle), 3->4, 4->5, 5->5 (self-loop)
    tree = np.array([1, 2, 0, 4, 5, 5])
    codeflash_output = tarjan(tree); result = codeflash_output
    expected = np.array([True, True, True, False, False, False])

def test_cycle_with_multiple_entries():
    # 0->0 (root), 1->2, 2->3, 3->1 (cycle), 4->2 (enters cycle)
    tree = np.array([0, 2, 3, 1, 2])
    codeflash_output = tarjan(tree); result = codeflash_output
    expected = np.array([False, True, True, True, False])

# -------------------- LARGE SCALE TEST CASES --------------------

def test_large_tree_no_cycle():
    # Large tree: 0 is root, all others point to 0
    N = 1000
    tree = np.zeros(N, dtype=int)
    codeflash_output = tarjan(tree); result = codeflash_output

def test_large_single_cycle():
    # Large cycle: 0->1, 1->2, ..., N-1->0
    N = 1000
    tree = np.roll(np.arange(N), -1)
    codeflash_output = tarjan(tree); result = codeflash_output

def test_large_two_cycles():
    # Two cycles of size N//2 each
    N = 1000
    tree = np.empty(N, dtype=int)
    tree[:N//2] = np.roll(np.arange(N//2), -1)
    tree[N//2:] = N//2 + np.roll(np.arange(N//2), -1)
    codeflash_output = tarjan(tree); result = codeflash_output
    # Each cycle should have N//2 True values
    sizes = sorted([np.sum(cycle) for cycle in result])

def test_large_tree_with_small_cycle():
    # Mostly tree, but with a small cycle at the end
    N = 1000
    tree = np.zeros(N, dtype=int)
    # Create a cycle among last 3 nodes
    tree[-3:] = [N-2, N-1, N-3]
    codeflash_output = tarjan(tree); result = codeflash_output
    expected = np.zeros(N, dtype=bool)
    expected[-3:] = True

def test_large_sparse_cycles():
    # Several small cycles scattered in a large graph
    N = 1000
    tree = np.zeros(N, dtype=int)
    cycles = []
    for start in range(10, 100, 10):
        # Each cycle of 3 nodes: i->i+1, i+1->i+2, i+2->i
        tree[start] = start+1
        tree[start+1] = start+2
        tree[start+2] = start
        cycle = np.zeros(N, dtype=bool)
        cycle[start:start+3] = True
        cycles.append(cycle)
    codeflash_output = tarjan(tree); result = codeflash_output
    # Each cycle should have 3 True values and match the expected pattern
    for cycle in result:
        pass

def test_large_graph_with_self_loops():
    # All nodes point to themselves (no cycles)
    N = 1000
    tree = np.arange(N)
    codeflash_output = tarjan(tree); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-tarjan-mh4g4eqi and push.

Codeflash

The optimization achieves a **123% speedup** by replacing expensive repeated `np.where()` calls with a precomputed dependency lookup map.

**Key optimization:**
- **Precomputed dependency map**: Instead of calling `np.where(np.equal(tree, i))[0]` for each node during the DFS traversal (which scans the entire array each time), the optimized version builds a `dependents_map` once at the beginning where `dependents_map[i]` contains all nodes that have `i` as their head.

**Why this is faster:**
- The original code performed an O(N) scan of the entire tree array for every node visited during DFS traversal
- The line profiler shows `strong_connect(i)` took 95% of runtime in the original (132ms out of 140ms total)
- The optimized version reduces this to 80.9% of a much smaller total runtime (77ms out of 96ms total)
- This transforms the dependency lookup from O(N) per node to O(1) amortized lookup

**Performance characteristics:**
- Most effective for **larger graphs** where the O(N²) behavior of repeated scans becomes dominant
- Particularly beneficial for **dense graphs** or graphs with many nodes having the same head
- The precomputation overhead (visible in profiler as ~11% of total time) is quickly amortized as graph size increases
- Test cases show consistent speedups across all scenarios, from simple cycles to large-scale graphs with 1000+ nodes

The optimization maintains identical behavior and output while dramatically reducing the algorithmic complexity of the dependency finding step.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 24, 2025 06:04
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant