Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 1, 2025

📄 2,342% (23.42x) speedup for figure_rst in plotly/io/_sg_scraper.py

⏱️ Runtime : 15.7 milliseconds 644 microseconds (best of 115 runs)

📝 Explanation and details

The optimized code achieves a 2341% speedup by replacing an expensive list comprehension with a lazy generator approach when only the first element is needed.

Key optimization:

  • Replaced list comprehension with generator: Instead of creating a full list with [... for figure_path in figure_list], the code now uses a generator expression (... for figure_path in figure_list) and extracts only the first item with next().

Why this is faster:

  • The original code processes all figures in figure_list upfront, even though only the first one is ever used. This is evident from the line profiler showing 99.7% of time spent building the complete figure_paths list.
  • The optimized version processes figures lazily - it only computes the path transformation for the first figure and stops immediately.

Performance impact by test case:

  • Massive gains for large lists: Tests with 1000+ figures show 21,556% to 29,717% speedups because the optimization avoids processing 999+ unnecessary items
  • Small overhead for single items: Tests with single figures show 1-7% slowdown due to the try/except overhead, but this is negligible compared to the gains for larger lists
  • Empty lists: Slight overhead (28-29% slower) due to exception handling vs simple boolean check

This optimization is particularly effective for the function's actual usage pattern where only the first figure matters, making it scale O(1) instead of O(n) with respect to list size.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 45 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import os

# imports
import pytest
from plotly.io._sg_scraper import figure_rst

SINGLE_HTML = """
.. raw:: html
    :file: %s
"""

# unit tests

# --- Basic Test Cases ---

def test_empty_figure_list_returns_empty_string():
    # No figures should result in empty RST
    codeflash_output = figure_rst([], "/some/path") # 878ns -> 1.25μs (29.6% slower)

def test_single_figure_relative_path():
    # Single figure, already relative to sources_dir
    sources_dir = "/home/user/docs"
    fig_path = "/home/user/docs/figs/plot1.png"
    expected = SINGLE_HTML % "images/plot1.png"
    codeflash_output = figure_rst([fig_path], sources_dir) # 13.0μs -> 13.3μs (2.76% slower)

def test_single_figure_absolute_path():
    # Single figure, absolute path, sources_dir is root
    sources_dir = "/"
    fig_path = "/figures/figureA.png"
    expected = SINGLE_HTML % "images/figureA.png"
    codeflash_output = figure_rst([fig_path], sources_dir) # 11.1μs -> 11.6μs (3.69% slower)

def test_multiple_figures_only_first_used():
    # Only the first figure should be used in output
    sources_dir = "/docs"
    fig1 = "/docs/imgs/a.png"
    fig2 = "/docs/imgs/b.png"
    expected = SINGLE_HTML % "images/a.png"
    codeflash_output = figure_rst([fig1, fig2], sources_dir) # 15.4μs -> 11.6μs (32.8% faster)

def test_figure_path_with_different_separators():
    # Test that os.sep is handled, even if input uses backslashes (Windows)
    sources_dir = r"C:\docs"
    fig_path = r"C:\docs\plots\myfig.png"
    # The output should always use forward slashes
    expected = SINGLE_HTML % "images/myfig.png"
    codeflash_output = figure_rst([fig_path], sources_dir) # 17.6μs -> 17.9μs (1.45% slower)

def test_figure_path_with_leading_slash():
    # Figure path with leading slash should not affect basename extraction
    sources_dir = "/docs"
    fig_path = "/docs/figs/someplot.png"
    expected = SINGLE_HTML % "images/someplot.png"
    codeflash_output = figure_rst([fig_path], sources_dir) # 11.3μs -> 11.4μs (1.32% slower)

def test_figure_path_with_trailing_slash_in_sources_dir():
    # sources_dir with trailing slash should not affect result
    sources_dir = "/docs/"
    fig_path = "/docs/figs/abc.png"
    expected = SINGLE_HTML % "images/abc.png"
    codeflash_output = figure_rst([fig_path], sources_dir) # 11.2μs -> 11.4μs (1.66% slower)

# --- Edge Test Cases ---

def test_figure_path_is_sources_dir():
    # Figure path is exactly the sources_dir (not a file, but test for robustness)
    sources_dir = "/docs"
    fig_path = "/docs"
    expected = SINGLE_HTML % "images/docs"
    codeflash_output = figure_rst([fig_path], sources_dir) # 9.64μs -> 10.2μs (5.89% slower)

def test_figure_path_outside_sources_dir():
    # Figure path is outside sources_dir (relpath will use ..)
    sources_dir = "/docs"
    fig_path = "/otherdir/plot.png"
    expected = SINGLE_HTML % "images/plot.png"
    codeflash_output = figure_rst([fig_path], sources_dir) # 11.8μs -> 11.9μs (0.656% slower)

def test_figure_path_with_dot_and_dotdot():
    # Figure path contains '.' and '..'
    sources_dir = "/docs/source"
    fig_path = "/docs/source/../source/figs/./plot.png"
    # Normalize path
    norm_path = os.path.normpath(fig_path)
    expected = SINGLE_HTML % "images/plot.png"
    codeflash_output = figure_rst([norm_path], sources_dir) # 10.9μs -> 11.1μs (1.18% slower)

def test_figure_path_with_unicode_characters():
    # Unicode in file name
    sources_dir = "/home/üser/docs"
    fig_path = "/home/üser/docs/figs/πlotΩ.png"
    expected = SINGLE_HTML % "images/πlotΩ.png"
    codeflash_output = figure_rst([fig_path], sources_dir) # 15.1μs -> 15.5μs (2.77% slower)

def test_sources_dir_and_figure_path_are_same_file():
    # sources_dir and figure_path are the same file (edge)
    sources_dir = "/docs/fig.png"
    fig_path = "/docs/fig.png"
    expected = SINGLE_HTML % "images/fig.png"
    codeflash_output = figure_rst([fig_path], sources_dir) # 10.6μs -> 11.0μs (3.09% slower)

def test_sources_dir_is_empty_string():
    # sources_dir is empty string (edge)
    fig_path = "/img.png"
    # relpath of absolute path to "" is just the path without leading slash
    expected = SINGLE_HTML % "images/img.png"
    codeflash_output = figure_rst([fig_path], "") # 16.1μs -> 16.5μs (2.43% slower)



def test_figure_path_is_dot():
    # Figure path is '.' (edge)
    sources_dir = "/docs"
    fig_path = "."
    expected = SINGLE_HTML % "images/."
    codeflash_output = figure_rst([fig_path], sources_dir) # 18.7μs -> 20.3μs (7.62% slower)

def test_sources_dir_is_dot():
    # sources_dir is '.' (edge)
    fig_path = "foo/bar.png"
    expected = SINGLE_HTML % "images/bar.png"
    codeflash_output = figure_rst([fig_path], ".") # 17.2μs -> 17.8μs (3.41% slower)


def test_figure_list_is_none():
    # None is not a valid list, should raise TypeError
    sources_dir = "/docs"
    with pytest.raises(TypeError):
        figure_rst(None, sources_dir) # 1.27μs -> 1.29μs (1.55% slower)

def test_figure_list_contains_non_string():
    # List contains non-string (e.g. int), should raise TypeError
    sources_dir = "/docs"
    with pytest.raises(TypeError):
        figure_rst([123], sources_dir) # 2.94μs -> 3.38μs (13.2% slower)

# --- Large Scale Test Cases ---

def test_large_number_of_figures_only_first_used():
    # Only the first figure should be used even if list is large
    sources_dir = "/docs"
    fig_paths = [f"/docs/imgs/fig_{i}.png" for i in range(1000)]
    expected = SINGLE_HTML % "images/fig_0.png"
    codeflash_output = figure_rst(fig_paths, sources_dir) # 3.28ms -> 15.1μs (21556% faster)

def test_large_sources_dir_path():
    # Very long sources_dir path
    sources_dir = "/" + "/".join(["longdir"] * 100)
    fig_path = sources_dir + "/fig.png"
    expected = SINGLE_HTML % "images/fig.png"
    codeflash_output = figure_rst([fig_path], sources_dir) # 40.6μs -> 41.9μs (3.14% slower)

def test_large_figure_path():
    # Very long figure path
    sources_dir = "/docs"
    fig_path = "/docs/" + "/".join(["deep"] * 200) + "/final.png"
    expected = SINGLE_HTML % "images/final.png"
    codeflash_output = figure_rst([fig_path], sources_dir) # 66.0μs -> 66.7μs (1.04% slower)

def test_large_number_of_figures_with_varied_names():
    # Large number of figures with unique names, only first used
    sources_dir = "/docs"
    fig_paths = [f"/docs/imgs/fig_{i}_special.png" for i in range(999)]
    expected = SINGLE_HTML % "images/fig_0_special.png"
    codeflash_output = figure_rst(fig_paths, sources_dir) # 3.29ms -> 12.4μs (26326% faster)

def test_large_figure_list_with_nonexistent_files():
    # Paths do not need to exist; function should not check existence
    sources_dir = "/docs"
    fig_paths = [f"/docs/fake/fig_{i}.png" for i in range(500)]
    expected = SINGLE_HTML % "images/fig_0.png"
    codeflash_output = figure_rst(fig_paths, sources_dir) # 1.63ms -> 12.0μs (13419% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import os

# imports
import pytest  # used for our unit tests
from plotly.io._sg_scraper import figure_rst

SINGLE_HTML = """
.. raw:: html
    :file: %s
"""

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------

def test_single_figure_simple_path():
    # Test with a single figure, simple absolute path, simple sources_dir
    figure = "/home/user/project/docs/_build/fig1.png"
    sources_dir = "/home/user/project/docs/_build"
    expected_rel = "fig1.png"
    expected_basename = "fig1.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 13.0μs -> 13.3μs (1.81% slower)

def test_single_figure_nested_path():
    # Test with a single figure in a nested subdirectory
    figure = "/docs/source/_build/images/gallery/fig2.png"
    sources_dir = "/docs/source/_build"
    expected_basename = "fig2.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 12.2μs -> 12.7μs (4.46% slower)

def test_multiple_figures_returns_first_only():
    # Test with more than one figure, should only use the first
    figures = [
        "/docs/source/_build/figA.png",
        "/docs/source/_build/figB.png",
        "/docs/source/_build/figC.png"
    ]
    sources_dir = "/docs/source/_build"
    expected_basename = "figA.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst(figures, sources_dir); result = codeflash_output # 19.4μs -> 11.5μs (68.6% faster)

def test_empty_figure_list_returns_empty_string():
    # Test with empty figure list
    codeflash_output = figure_rst([], "/some/path"); result = codeflash_output # 818ns -> 1.15μs (28.7% slower)

def test_sources_dir_with_trailing_slash():
    # Test sources_dir with trailing slash
    figure = "/foo/bar/baz/fig1.png"
    sources_dir = "/foo/bar/baz/"
    expected_basename = "fig1.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 11.8μs -> 12.3μs (3.60% slower)

def test_sources_dir_without_trailing_slash():
    # Test sources_dir without trailing slash
    figure = "/foo/bar/baz/fig1.png"
    sources_dir = "/foo/bar/baz"
    expected_basename = "fig1.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 11.0μs -> 11.5μs (4.05% slower)

def test_relative_paths_in_figures():
    # Test with figure path relative to sources_dir
    sources_dir = os.path.abspath("docs/source")
    figure = os.path.join(sources_dir, "gallery", "fig.png")
    expected_basename = "fig.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 11.0μs -> 11.4μs (3.72% slower)

# ------------------------
# Edge Test Cases
# ------------------------

def test_figure_path_is_sources_dir():
    # Figure path is the same as sources_dir (nonsensical, but test for robustness)
    sources_dir = "/foo/bar"
    figure = "/foo/bar"
    # os.path.basename("/foo/bar") == "bar"
    expected_basename = "bar"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 9.76μs -> 10.1μs (3.32% slower)

def test_figure_path_has_trailing_slash():
    # Figure path is a directory with trailing slash (should treat as basename '')
    sources_dir = "/foo/bar"
    figure = "/foo/bar/"
    expected_basename = ""  # os.path.basename("/foo/bar/") == ""
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 10.0μs -> 10.3μs (2.67% slower)

def test_sources_dir_is_root():
    # sources_dir is filesystem root
    figure = "/fig.png"
    sources_dir = "/"
    expected_basename = "fig.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 9.96μs -> 10.2μs (2.85% slower)

def test_figure_path_not_under_sources_dir():
    # Figure path is not under sources_dir
    figure = "/tmp/fig.png"
    sources_dir = "/docs/source"
    expected_basename = "fig.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 11.5μs -> 12.0μs (3.67% slower)

def test_figure_path_with_dot_segments():
    # Figure path contains '.' or '..'
    sources_dir = "/docs/source"
    figure = "/docs/source/gallery/../gallery2/fig.png"
    expected_basename = "fig.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 11.9μs -> 11.9μs (0.142% slower)

def test_figure_path_with_windows_separator():
    # Simulate Windows-style path separator
    sources_dir = r"C:\docs\source"
    figure = r"C:\docs\source\gallery\fig.png"
    expected_basename = "fig.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 16.4μs -> 16.3μs (1.00% faster)

def test_figure_path_with_unicode_characters():
    # Unicode characters in filename
    sources_dir = "/docs/source"
    figure = "/docs/source/galería/fiğüřę.png"
    expected_basename = "fiğüřę.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 13.4μs -> 13.8μs (3.25% slower)

def test_figure_path_with_spaces():
    # Spaces in file path
    sources_dir = "/docs/source"
    figure = "/docs/source/gallery/my figure.png"
    expected_basename = "my figure.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 11.0μs -> 11.3μs (2.73% slower)


def test_sources_dir_is_empty_figure_is_normal():
    # sources_dir is empty, figure is absolute path
    figure = "/foo/bar/fig.png"
    sources_dir = ""
    expected_basename = "fig.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 19.5μs -> 19.9μs (2.07% slower)

def test_figure_list_is_none():
    # Passing None instead of a list should raise TypeError
    with pytest.raises(TypeError):
        figure_rst(None, "/foo/bar") # 1.12μs -> 1.14μs (1.67% slower)


def test_figure_list_contains_non_string():
    # Figure list contains a non-string entry
    sources_dir = "/foo/bar"
    figures = [123, "/foo/bar/fig.png"]
    with pytest.raises(TypeError):
        figure_rst(figures, sources_dir) # 2.76μs -> 3.45μs (19.8% slower)


def test_large_number_of_figures():
    # Test with 1000 figures, only the first should be used
    sources_dir = "/foo/bar"
    figures = [f"/foo/bar/fig{i}.png" for i in range(1000)]
    expected_basename = "fig0.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst(figures, sources_dir); result = codeflash_output # 3.40ms -> 14.3μs (23598% faster)

def test_large_path_strings():
    # Test with very long path strings
    sources_dir = "/foo/bar"
    long_name = "a" * 200 + ".png"
    figure = f"/foo/bar/{long_name}"
    expected_basename = long_name
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 11.6μs -> 12.5μs (7.50% slower)

def test_large_sources_dir():
    # Test with a very long sources_dir
    sources_dir = "/" + "/".join(["verylong"] * 100)
    figure = os.path.join(sources_dir, "fig.png")
    expected_basename = "fig.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 38.9μs -> 39.6μs (1.68% slower)

def test_large_number_of_nested_directories():
    # Figure path with many nested directories
    sources_dir = "/foo/bar"
    nested_dirs = "/".join([f"dir{i}" for i in range(50)])
    figure = f"/foo/bar/{nested_dirs}/fig.png"
    expected_basename = "fig.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst([figure], sources_dir); result = codeflash_output # 25.0μs -> 25.5μs (1.90% slower)

def test_large_number_of_figures_with_varied_paths():
    # 1000 figures, only the first is used, with varied paths
    sources_dir = "/foo/bar"
    figures = [f"/foo/bar/gallery{i}/fig{i}.png" for i in range(1000)]
    expected_basename = "fig0.png"
    expected_rst = SINGLE_HTML % os.path.join("images", expected_basename)
    codeflash_output = figure_rst(figures, sources_dir); result = codeflash_output # 3.55ms -> 11.9μs (29717% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-figure_rst-mhghmshv and push.

Codeflash Static Badge

The optimized code achieves a **2341% speedup** by replacing an expensive list comprehension with a lazy generator approach when only the first element is needed.

**Key optimization:**
- **Replaced list comprehension with generator**: Instead of creating a full list with `[... for figure_path in figure_list]`, the code now uses a generator expression `(... for figure_path in figure_list)` and extracts only the first item with `next()`.

**Why this is faster:**
- The original code processes **all** figures in `figure_list` upfront, even though only the first one is ever used. This is evident from the line profiler showing 99.7% of time spent building the complete `figure_paths` list.
- The optimized version processes figures **lazily** - it only computes the path transformation for the first figure and stops immediately.

**Performance impact by test case:**
- **Massive gains for large lists**: Tests with 1000+ figures show 21,556% to 29,717% speedups because the optimization avoids processing 999+ unnecessary items
- **Small overhead for single items**: Tests with single figures show 1-7% slowdown due to the try/except overhead, but this is negligible compared to the gains for larger lists
- **Empty lists**: Slight overhead (28-29% slower) due to exception handling vs simple boolean check

This optimization is particularly effective for the function's actual usage pattern where only the first figure matters, making it scale O(1) instead of O(n) with respect to list size.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 1, 2025 16:20
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant