Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 1, 2025

📄 10% (0.10x) speedup for split_dict_by_key_prefix in plotly/shapeannotation.py

⏱️ Runtime : 780 microseconds 712 microseconds (best of 207 runs)

📝 Explanation and details

The optimization delivers a 9% speedup through two key changes:

1. Efficient dictionary iteration: Changed from for k in d.keys(): ... d[k] to for k, v in d.items(). The original approach performs two dictionary lookups per iteration - one to get the key from d.keys() and another to retrieve d[k]. The optimized version gets both key and value in a single operation, eliminating redundant hash table lookups.

2. Faster dictionary initialization: Replaced dict() constructor calls with literal {} syntax. Dictionary literals are parsed and optimized at compile time, avoiding the overhead of function calls to the dict() constructor.

Performance impact by test case:

  • Large dictionaries benefit most: 13.6% speedup on 1000-element mixed dictionaries, 10.1% on 1000-element no-match cases
  • Small dictionaries see consistent gains: 4-10% improvements across basic test cases
  • Edge cases maintain improvements: Even error-throwing test cases show 6-14% speedups due to faster iteration before exceptions occur

The line profiler shows the optimization reduces time spent on dictionary value retrieval (lines with d[k] assignments) from 34.5% to 31.7% of total runtime, while the iteration overhead slightly increases due to unpacking but overall delivers net performance gains.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 45 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from plotly.shapeannotation import split_dict_by_key_prefix

# unit tests

# ----------- BASIC TEST CASES -----------

def test_basic_no_prefix_match():
    # No keys start with the prefix
    d = {'a': 1, 'b': 2, 'c': 3}
    prefix = 'x'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.22μs -> 1.17μs (4.79% faster)

def test_basic_all_prefix_match():
    # All keys start with the prefix
    d = {'pre_one': 1, 'pre_two': 2}
    prefix = 'pre_'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.04μs -> 973ns (6.99% faster)

def test_basic_some_prefix_match():
    # Some keys start with the prefix, others do not
    d = {'foo': 1, 'bar': 2, 'pre_baz': 3, 'pre_qux': 4}
    prefix = 'pre_'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.26μs -> 1.20μs (5.08% faster)

def test_basic_empty_dict():
    # Empty dictionary should return two empty dicts
    d = {}
    prefix = 'doesnotmatter'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 652ns -> 603ns (8.13% faster)

def test_basic_empty_prefix():
    # Empty prefix matches all keys
    d = {'a': 1, 'b': 2}
    prefix = ''
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.02μs -> 927ns (9.82% faster)

def test_basic_single_key_match():
    # Only one key, which matches
    d = {'abc': 42}
    prefix = 'a'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 892ns -> 823ns (8.38% faster)

def test_basic_single_key_no_match():
    # Only one key, which does not match
    d = {'xyz': 99}
    prefix = 'a'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 901ns -> 853ns (5.63% faster)

# ----------- EDGE TEST CASES -----------

def test_edge_prefix_is_full_key():
    # Prefix is exactly equal to one key
    d = {'apple': 1, 'banana': 2, 'app': 3}
    prefix = 'apple'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.16μs -> 1.13μs (2.48% faster)

def test_edge_prefix_is_longer_than_any_key():
    # Prefix longer than any key, so no matches
    d = {'a': 1, 'b': 2}
    prefix = 'abcdef'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 988ns -> 953ns (3.67% faster)

def test_edge_key_is_empty_string():
    # Key is empty string, prefix is also empty string
    d = {'': 123, 'foo': 456}
    prefix = ''
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 984ns -> 929ns (5.92% faster)

def test_edge_key_is_empty_string_nonempty_prefix():
    # Key is empty string, prefix is nonempty
    d = {'': 123, 'foo': 456}
    prefix = 'bar'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.03μs -> 984ns (4.78% faster)

def test_edge_prefix_is_special_character():
    # Prefix is a special character
    d = {'#foo': 1, 'bar': 2, '#baz': 3}
    prefix = '#'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.19μs -> 1.08μs (10.0% faster)

def test_edge_key_is_nonstring():
    # Keys are non-strings; should raise AttributeError
    d = {1: 'one', 2: 'two'}
    prefix = '1'
    with pytest.raises(AttributeError):
        split_dict_by_key_prefix(d, prefix) # 1.50μs -> 1.37μs (9.26% faster)

def test_edge_prefix_is_none():
    # Prefix is None; should raise TypeError
    d = {'foo': 1, 'bar': 2}
    prefix = None
    with pytest.raises(TypeError):
        split_dict_by_key_prefix(d, prefix) # 1.43μs -> 1.34μs (6.48% faster)

def test_edge_key_is_none():
    # Key is None; should raise AttributeError
    d = {None: 1, 'foo': 2}
    prefix = 'f'
    with pytest.raises(AttributeError):
        split_dict_by_key_prefix(d, prefix) # 1.45μs -> 1.36μs (6.68% faster)

def test_edge_key_is_tuple():
    # Key is tuple; should raise AttributeError
    d = {(1, 2): 'a', 'foo': 'b'}
    prefix = 'f'
    with pytest.raises(AttributeError):
        split_dict_by_key_prefix(d, prefix) # 1.63μs -> 1.46μs (11.5% faster)

def test_edge_prefix_is_integer():
    # Prefix is integer; should raise TypeError
    d = {'foo': 1, 'bar': 2}
    prefix = 1
    with pytest.raises(TypeError):
        split_dict_by_key_prefix(d, prefix) # 1.39μs -> 1.37μs (1.46% faster)


def test_edge_prefix_is_bytes():
    # Prefix is bytes; should raise TypeError
    d = {'foo': 1, 'bar': 2}
    prefix = b'f'
    with pytest.raises(TypeError):
        split_dict_by_key_prefix(d, prefix) # 1.74μs -> 1.53μs (13.9% faster)

# ----------- LARGE SCALE TEST CASES -----------

def test_large_all_match():
    # All keys match the prefix
    d = {f'pre_{i}': i for i in range(1000)}
    prefix = 'pre_'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 78.5μs -> 72.9μs (7.69% faster)

def test_large_none_match():
    # No keys match the prefix
    d = {f'item_{i}': i for i in range(1000)}
    prefix = 'pre_'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 77.8μs -> 70.7μs (10.1% faster)

def test_large_half_match():
    # Half keys match the prefix, half do not
    d = {}
    for i in range(500):
        d[f'pre_{i}'] = i
    for i in range(500, 1000):
        d[f'item_{i}'] = i
    prefix = 'pre_'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 82.3μs -> 72.5μs (13.6% faster)
    expected_with_prefix = {f'pre_{i}': i for i in range(500)}
    expected_no_prefix = {f'item_{i}': i for i in range(500, 1000)}

def test_large_randomized_keys():
    # Randomized keys, some match, some don't
    import random
    random.seed(42)
    keys = [f'pre_{i}' if random.random() < 0.5 else f'item_{i}' for i in range(1000)]
    d = {k: i for i, k in enumerate(keys)}
    prefix = 'pre_'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 76.1μs -> 71.4μs (6.67% faster)

def test_large_empty_prefix():
    # Empty prefix matches all keys, even for large dict
    d = {f'key_{i}': i for i in range(1000)}
    prefix = ''
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 75.0μs -> 68.4μs (9.72% faster)

def test_large_empty_dict():
    # Large scale, but empty dict
    d = {}
    prefix = 'pre_'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 686ns -> 669ns (2.54% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest  # used for our unit tests
from plotly.shapeannotation import split_dict_by_key_prefix

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------

def test_basic_split():
    # Basic case: some keys with prefix, some without
    d = {'foo1': 1, 'foo2': 2, 'bar1': 3, 'baz': 4}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.33μs -> 1.27μs (4.32% faster)

def test_no_keys_with_prefix():
    # No keys start with the prefix
    d = {'bar1': 3, 'baz': 4}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.03μs -> 943ns (8.91% faster)

def test_all_keys_with_prefix():
    # All keys start with the prefix
    d = {'foo1': 1, 'foo2': 2, 'foo3': 3}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.04μs -> 1.04μs (0.287% slower)

def test_empty_dict():
    # Empty dictionary input
    d = {}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 653ns -> 615ns (6.18% faster)

def test_empty_prefix():
    # Empty prefix should match all keys (since all start with '')
    d = {'a': 1, 'b': 2}
    prefix = ''
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.01μs -> 978ns (2.86% faster)

def test_single_key_with_prefix():
    # Single key with prefix
    d = {'foo': 42}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 890ns -> 849ns (4.83% faster)

def test_single_key_without_prefix():
    # Single key without prefix
    d = {'bar': 99}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 901ns -> 814ns (10.7% faster)

# ------------------------
# Edge Test Cases
# ------------------------

def test_prefix_is_full_key():
    # Prefix is exactly a key
    d = {'foo': 1, 'foobar': 2, 'barfoo': 3}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.12μs -> 1.09μs (2.29% faster)

def test_prefix_is_longer_than_any_key():
    # Prefix longer than any key
    d = {'a': 1, 'b': 2}
    prefix = 'longprefix'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.00μs -> 945ns (5.93% faster)

def test_prefix_is_special_characters():
    # Prefix is special characters
    d = {'!foo': 1, '@foo': 2, 'foo': 3, 'bar': 4}
    prefix = '!'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.22μs -> 1.15μs (6.81% faster)

def test_keys_are_empty_strings():
    # Keys are empty strings
    d = {'': 1, 'foo': 2}
    prefix = ''
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.00μs -> 936ns (7.16% faster)


def test_keys_are_not_strings():
    # Keys are not strings (should raise AttributeError)
    d = {1: 'a', 2: 'b', 'foo': 'c'}
    prefix = 'f'
    # Only string keys can be checked with startswith
    with pytest.raises(AttributeError):
        split_dict_by_key_prefix(d, prefix) # 1.76μs -> 1.67μs (5.59% faster)


def test_prefix_is_whitespace():
    # Prefix is whitespace
    d = {' foo': 1, ' bar': 2, 'baz': 3}
    prefix = ' '
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.47μs -> 1.30μs (13.3% faster)

def test_unicode_prefix_and_keys():
    # Unicode keys and prefix
    d = {'αfoo': 1, 'βbar': 2, 'αbaz': 3}
    prefix = 'α'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.29μs -> 1.25μs (3.85% faster)

def test_case_sensitivity():
    # Prefix is case-sensitive
    d = {'Foo': 1, 'fooBar': 2, 'foobar': 3}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.18μs -> 1.10μs (7.09% faster)

def test_prefix_is_substring_of_key():
    # Prefix is a substring but not at the start
    d = {'barfoo': 1, 'bazfoo': 2, 'foo': 3}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 1.22μs -> 1.10μs (10.7% faster)

# ------------------------
# Large Scale Test Cases
# ------------------------

def test_large_dict_mixed_prefix():
    # Large dictionary, mix of prefixed and non-prefixed keys
    d = {}
    for i in range(500):
        d[f'foo{i}'] = i
    for i in range(500):
        d[f'bar{i}'] = i + 500
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 81.9μs -> 72.2μs (13.4% faster)
    for i in range(500):
        pass

def test_large_dict_all_prefix():
    # Large dictionary, all keys start with prefix
    d = {f'foo{i}': i for i in range(1000)}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 77.2μs -> 70.7μs (9.11% faster)

def test_large_dict_none_with_prefix():
    # Large dictionary, no keys start with prefix
    d = {f'bar{i}': i for i in range(1000)}
    prefix = 'foo'
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 76.2μs -> 69.9μs (8.98% faster)

def test_large_dict_empty_prefix():
    # Large dictionary, empty prefix (all keys should match)
    d = {f'key{i}': i for i in range(1000)}
    prefix = ''
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 75.1μs -> 68.6μs (9.49% faster)

def test_large_dict_randomized_keys():
    # Large dictionary with random keys, some with prefix, some without
    import random
    import string
    random.seed(0)
    d = {}
    prefix = 'pre'
    for i in range(500):
        # 50% chance to have prefix
        if random.random() < 0.5:
            key = 'pre' + ''.join(random.choices(string.ascii_letters, k=5))
        else:
            key = ''.join(random.choices(string.ascii_letters, k=8))
        d[key] = i
    no_prefix, with_prefix = split_dict_by_key_prefix(d, prefix) # 39.6μs -> 36.5μs (8.28% faster)
    # All keys in with_prefix start with 'pre'
    for k in with_prefix.keys():
        pass
    # All keys in no_prefix do not start with 'pre'
    for k in no_prefix.keys():
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-split_dict_by_key_prefix-mhgeffhu and push.

Codeflash Static Badge

The optimization delivers a **9% speedup** through two key changes:

**1. Efficient dictionary iteration**: Changed from `for k in d.keys(): ... d[k]` to `for k, v in d.items()`. The original approach performs two dictionary lookups per iteration - one to get the key from `d.keys()` and another to retrieve `d[k]`. The optimized version gets both key and value in a single operation, eliminating redundant hash table lookups.

**2. Faster dictionary initialization**: Replaced `dict()` constructor calls with literal `{}` syntax. Dictionary literals are parsed and optimized at compile time, avoiding the overhead of function calls to the `dict()` constructor.

**Performance impact by test case**:
- **Large dictionaries benefit most**: 13.6% speedup on 1000-element mixed dictionaries, 10.1% on 1000-element no-match cases
- **Small dictionaries see consistent gains**: 4-10% improvements across basic test cases
- **Edge cases maintain improvements**: Even error-throwing test cases show 6-14% speedups due to faster iteration before exceptions occur

The line profiler shows the optimization reduces time spent on dictionary value retrieval (lines with `d[k]` assignments) from 34.5% to 31.7% of total runtime, while the iteration overhead slightly increases due to unpacking but overall delivers net performance gains.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 1, 2025 14:50
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant