Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 1, 2025

📄 30% (0.30x) speedup for endpts_to_intervals in plotly/figure_factory/_scatterplot.py

⏱️ Runtime : 922 microseconds 710 microseconds (best of 111 runs)

📝 Explanation and details

The optimized code achieves a 29% speedup through three key optimizations:

1. Combined Single-Pass Validation
The original code made two separate passes through the input: one to check for strings and another to verify increasing order. The optimized version combines both validations into a single loop using enumerate(), reducing the number of iterations from 2×n to 1×n for the validation phase.

2. Simplified Type Checking
Changed isinstance(endpts, (tuple)) or isinstance(endpts, (list)) to the more efficient isinstance(endpts, (list, tuple)), eliminating redundant function calls and logical operations.

3. Efficient Interval Construction
Replaced the original approach of creating empty lists and appending elements individually with:

  • Direct list initialization: intervals = [[float("-inf"), endpts[0]]]
  • List comprehension for middle intervals: [[endpts[k], endpts[k + 1]] for k in range(length - 1)]
  • Using intervals.extend() instead of individual appends

Performance Characteristics by Test Case:

  • Small inputs (2-10 elements): The optimizations show minimal impact due to overhead, with some cases being slightly slower
  • Large inputs (500-1000 elements): Dramatic improvements of 40-65% faster due to reduced loop iterations and more efficient list operations
  • Early validation failures: Mixed results - string detection is slightly slower due to enumerate overhead, but non-increasing sequence detection is faster due to single-pass validation

The optimizations particularly excel with larger datasets where the reduced algorithmic complexity (fewer passes) and more efficient list construction methods provide substantial performance gains.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 46 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 94.4%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
# function to test
from plotly import exceptions
from plotly.figure_factory._scatterplot import endpts_to_intervals

# unit tests

# ----------- Basic Test Cases -----------

def test_basic_two_endpoints():
    # Basic case: two increasing numbers
    codeflash_output = endpts_to_intervals([1, 6]) # 3.20μs -> 3.87μs (17.3% slower)

def test_basic_three_endpoints():
    # Three increasing numbers
    codeflash_output = endpts_to_intervals([0, 5, 10]) # 2.94μs -> 3.51μs (16.3% slower)

def test_basic_tuple_input():
    # Accepts tuples as well as lists
    codeflash_output = endpts_to_intervals((2, 4, 8)) # 2.62μs -> 3.49μs (24.8% slower)

def test_basic_float_endpoints():
    # Works with floats
    codeflash_output = endpts_to_intervals([1.5, 2.5, 3.5]) # 2.64μs -> 3.11μs (15.1% slower)

def test_basic_negative_endpoints():
    # Negative numbers
    codeflash_output = endpts_to_intervals([-5, 0, 5]) # 2.71μs -> 3.10μs (12.5% slower)

# ----------- Edge Test Cases -----------


def test_single_endpoint():
    # A single endpoint should return two intervals: (-inf, x), (x, inf)
    codeflash_output = endpts_to_intervals([7]) # 2.81μs -> 3.77μs (25.3% slower)


def test_string_in_endpoints():
    # Rejects any string in the sequence
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals(['a', 2, 3]) # 1.44μs -> 1.89μs (24.1% slower)
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([1, 'b', 3]) # 939ns -> 1.19μs (20.8% slower)
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([1, 2, 'c']) # 642ns -> 855ns (24.9% slower)

def test_non_increasing_sequence():
    # Rejects non-increasing numbers
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([3, 2, 5]) # 2.20μs -> 1.80μs (22.0% faster)
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([1, 1, 2]) # 1.07μs -> 1.01μs (5.44% faster)
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([1, 2, 2]) # 917ns -> 790ns (16.1% faster)
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([5, 3, 1]) # 677ns -> 541ns (25.1% faster)

def test_inf_and_nan_in_endpoints():
    # Accepts inf in endpoints, but sequence must be strictly increasing
    codeflash_output = endpts_to_intervals([1, float("inf")])
    # NaN is not a string, but not strictly increasing
    import math
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([1, math.nan, 3])


def test_really_large_and_small_numbers():
    # Large and small numbers
    codeflash_output = endpts_to_intervals([-1e100, 0, 1e100]) # 3.92μs -> 4.59μs (14.5% slower)

# ----------- Large Scale Test Cases -----------

def test_large_number_of_endpoints():
    # 1000 strictly increasing endpoints
    endpoints = list(range(1000))
    codeflash_output = endpts_to_intervals(endpoints); intervals = codeflash_output # 154μs -> 109μs (41.3% faster)

def test_large_float_endpoints():
    # 500 increasing floats
    endpoints = [i * 0.1 for i in range(500)]
    codeflash_output = endpts_to_intervals(endpoints); intervals = codeflash_output # 72.7μs -> 52.1μs (39.5% faster)

def test_large_negative_to_positive():
    # 1000 endpoints from -500 to 499
    endpoints = list(range(-500, 500))
    codeflash_output = endpts_to_intervals(endpoints); intervals = codeflash_output # 148μs -> 104μs (42.4% faster)

def test_large_non_increasing():
    # Large non-increasing sequence should raise
    endpoints = [i for i in range(500)] + [499]  # last is not increasing
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals(endpoints) # 31.6μs -> 26.5μs (19.4% faster)

def test_large_input_with_string():
    # Large input with a string in the middle
    endpoints = list(range(500))
    endpoints[250] = 'bad'
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals(endpoints) # 7.61μs -> 13.5μs (43.8% slower)

# ----------- Miscellaneous/Mutation-sensitive Test Cases -----------

def test_mutation_sensitive_order():
    # Changing order should fail
    codeflash_output = endpts_to_intervals([1, 2, 3]) # 3.10μs -> 3.56μs (12.9% slower)
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([3, 2, 1]) # 1.34μs -> 1.31μs (2.37% faster)

def test_mutation_sensitive_endpoint_values():
    # Changing any endpoint value changes output
    codeflash_output = endpts_to_intervals([1, 2, 3]) # 2.80μs -> 3.34μs (16.1% slower)

def test_mutation_sensitive_type():
    # Changing from list to tuple does not change output
    codeflash_output = endpts_to_intervals([1, 2, 3]) # 2.57μs -> 3.14μs (18.2% slower)

def test_mutation_sensitive_inf_handling():
    # Output with inf endpoint
    codeflash_output = endpts_to_intervals([1, float("inf")]) # 2.48μs -> 3.07μs (19.0% slower)
    # Changing inf to a large number changes output
    codeflash_output = endpts_to_intervals([1, 1e10]) # 1.20μs -> 1.34μs (10.3% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest  # used for our unit tests
# function to test
from plotly import exceptions
from plotly.figure_factory._scatterplot import endpts_to_intervals

# unit tests

# --------------------- Basic Test Cases ---------------------

def test_basic_two_elements():
    # Test with two increasing numbers
    codeflash_output = endpts_to_intervals([1, 6]); result = codeflash_output # 2.21μs -> 2.88μs (23.5% slower)

def test_basic_three_elements():
    # Test with three increasing numbers
    codeflash_output = endpts_to_intervals([0, 5, 10]); result = codeflash_output # 2.59μs -> 3.11μs (16.8% slower)

def test_basic_tuple_input():
    # Test with tuple input
    codeflash_output = endpts_to_intervals((2, 4, 8)); result = codeflash_output # 2.66μs -> 3.36μs (20.7% slower)

def test_basic_float_elements():
    # Test with floats
    codeflash_output = endpts_to_intervals([1.1, 2.2, 3.3]); result = codeflash_output # 2.66μs -> 2.98μs (10.9% slower)

def test_basic_negative_and_positive():
    # Test with negative and positive numbers
    codeflash_output = endpts_to_intervals([-10, 0, 10]); result = codeflash_output # 2.54μs -> 3.15μs (19.4% slower)

# --------------------- Edge Test Cases ---------------------


def test_edge_single_element():
    # List with one element
    codeflash_output = endpts_to_intervals([5]); result = codeflash_output # 2.76μs -> 3.64μs (24.3% slower)


def test_edge_contains_string():
    # Input contains a string
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([1, 'a', 3]) # 1.58μs -> 2.11μs (25.2% slower)

def test_edge_contains_string_tuple():
    # Tuple contains a string
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals((1, 2, 'b')) # 1.32μs -> 2.16μs (38.8% slower)

def test_edge_non_increasing():
    # Numbers not strictly increasing
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([1, 1, 2]) # 2.27μs -> 1.93μs (17.3% faster)
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([3, 2, 4]) # 1.09μs -> 1.04μs (5.12% faster)
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([1, 3, 2]) # 926ns -> 770ns (20.3% faster)


def test_edge_bool_in_list():
    # bool is a subclass of int, but should be allowed
    codeflash_output = endpts_to_intervals([False, True, 2]); result = codeflash_output # 3.62μs -> 4.42μs (18.1% slower)

def test_edge_duplicate_numbers():
    # Duplicate numbers (not strictly increasing)
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals([1, 2, 2, 3]) # 2.13μs -> 2.06μs (3.45% faster)

def test_edge_large_negative_and_positive():
    # Very large negative and positive numbers
    codeflash_output = endpts_to_intervals([-1e308, 0, 1e308]); result = codeflash_output # 3.05μs -> 3.90μs (21.9% slower)

# --------------------- Large Scale Test Cases ---------------------

def test_large_scale_1000_elements():
    # Test with 1000 strictly increasing integers
    data = list(range(1000))
    codeflash_output = endpts_to_intervals(data); result = codeflash_output # 153μs -> 108μs (41.8% faster)

def test_large_scale_floats():
    # Test with 500 increasing floats
    data = [i * 0.5 for i in range(500)]
    codeflash_output = endpts_to_intervals(data); result = codeflash_output # 74.2μs -> 52.4μs (41.5% faster)

def test_large_scale_negative_to_positive():
    # Test with large range from negative to positive
    data = list(range(-500, 500))
    codeflash_output = endpts_to_intervals(data); result = codeflash_output # 148μs -> 103μs (43.5% faster)

def test_large_scale_non_increasing_middle():
    # Test with non-increasing value in the middle of a large list
    data = list(range(500)) + [499] + list(range(501, 1000))
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals(data) # 44.1μs -> 26.6μs (66.1% faster)

def test_large_scale_string_in_middle():
    # Test with a string in the middle of a large list
    data = list(range(500)) + ['bad'] + list(range(501, 1000))
    with pytest.raises(exceptions.PlotlyError):
        endpts_to_intervals(data) # 13.8μs -> 26.5μs (47.8% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-endpts_to_intervals-mhgbaaba and push.

Codeflash Static Badge

The optimized code achieves a 29% speedup through three key optimizations:

**1. Combined Single-Pass Validation**
The original code made two separate passes through the input: one to check for strings and another to verify increasing order. The optimized version combines both validations into a single loop using `enumerate()`, reducing the number of iterations from 2×n to 1×n for the validation phase.

**2. Simplified Type Checking**
Changed `isinstance(endpts, (tuple)) or isinstance(endpts, (list))` to the more efficient `isinstance(endpts, (list, tuple))`, eliminating redundant function calls and logical operations.

**3. Efficient Interval Construction**
Replaced the original approach of creating empty lists and appending elements individually with:
- Direct list initialization: `intervals = [[float("-inf"), endpts[0]]]`
- List comprehension for middle intervals: `[[endpts[k], endpts[k + 1]] for k in range(length - 1)]`
- Using `intervals.extend()` instead of individual appends

**Performance Characteristics by Test Case:**
- **Small inputs (2-10 elements)**: The optimizations show minimal impact due to overhead, with some cases being slightly slower
- **Large inputs (500-1000 elements)**: Dramatic improvements of 40-65% faster due to reduced loop iterations and more efficient list operations
- **Early validation failures**: Mixed results - string detection is slightly slower due to enumerate overhead, but non-increasing sequence detection is faster due to single-pass validation

The optimizations particularly excel with larger datasets where the reduced algorithmic complexity (fewer passes) and more efficient list construction methods provide substantial performance gains.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 1, 2025 13:22
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant