Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 25, 2025

📄 149% (1.49x) speedup for _is_tmp_file in marimo/_server/recents.py

⏱️ Runtime : 2.28 milliseconds 915 microseconds (best of 166 runs)

📝 Explanation and details

The optimization replaces the any() generator expression with a direct call to str.startswith() using a tuple argument. Instead of iterating through each folder in _IGNORED_FOLDERS and calling startswith() individually, the optimized version passes the entire tuple to startswith(), which can check all prefixes in a single operation.

Key changes:

  • Eliminated generator expression overhead: Removed the any() function and generator that required Python to iterate through each folder name
  • Single native string operation: str.startswith() with a tuple argument is implemented in C and performs all prefix checks in one call
  • Reduced function call overhead: Instead of potentially 2 separate startswith() calls (one for each folder), there's now just 1

Why it's faster:
The original code uses any() with a generator that calls filename.startswith() for each folder until a match is found. This involves Python bytecode interpretation overhead for each iteration. The optimized version leverages the fact that str.startswith() natively accepts a tuple of prefixes and performs all checks efficiently in C code.

Performance characteristics:

  • Best case scenarios: Files starting with /tmp see ~180% speedup since the check completes immediately without iteration
  • Consistent gains: All test cases show 115-200% improvements, indicating the optimization benefits both matching and non-matching cases
  • Scales well: Large-scale tests with 500-1000 files maintain 136-170% speedups, showing the optimization doesn't degrade with volume

The 149% overall speedup demonstrates that eliminating the Python-level iteration overhead provides substantial performance gains across all input patterns.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 5826 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

# imports
import pytest
from marimo._server.recents import _is_tmp_file

# function to test
# Copyright 2024 Marimo. All rights reserved.


_IGNORED_FOLDERS = ("/tmp", "/var")
from marimo._server.recents import _is_tmp_file

# unit tests

# -------------------------------
# 1. Basic Test Cases
# -------------------------------

def test_basic_tmp_file_in_tmp():
    # Should return True for a file directly in /tmp
    codeflash_output = _is_tmp_file("/tmp/file.txt") # 1.16μs -> 418ns (178% faster)

def test_basic_tmp_file_in_var():
    # Should return True for a file directly in /var
    codeflash_output = _is_tmp_file("/var/log.txt") # 1.33μs -> 465ns (186% faster)

def test_basic_non_tmp_file():
    # Should return False for a file not in /tmp or /var
    codeflash_output = _is_tmp_file("/home/user/file.txt") # 1.19μs -> 476ns (150% faster)

def test_basic_relative_path():
    # Should return False for a relative path not starting with /tmp or /var
    codeflash_output = _is_tmp_file("tmp/file.txt") # 1.11μs -> 481ns (131% faster)
    codeflash_output = _is_tmp_file("var/log.txt") # 587ns -> 233ns (152% faster)

def test_basic_subfolder_in_tmp():
    # Should return True for a file in a subfolder of /tmp
    codeflash_output = _is_tmp_file("/tmp/somefolder/file.txt") # 1.15μs -> 419ns (174% faster)

def test_basic_subfolder_in_var():
    # Should return True for a file in a subfolder of /var
    codeflash_output = _is_tmp_file("/var/lib/data.db") # 1.23μs -> 484ns (154% faster)

# -------------------------------
# 2. Edge Test Cases
# -------------------------------

def test_edge_empty_string():
    # Should return False for empty string
    codeflash_output = _is_tmp_file("") # 1.14μs -> 439ns (159% faster)

def test_edge_only_folder_name():
    # Should return True for exactly "/tmp" or "/var"
    codeflash_output = _is_tmp_file("/tmp") # 1.11μs -> 469ns (137% faster)
    codeflash_output = _is_tmp_file("/var") # 686ns -> 282ns (143% faster)

def test_edge_similar_prefixes():
    # Should return False for paths that only partially match
    codeflash_output = _is_tmp_file("/tmpfile.txt") # 1.07μs -> 376ns (185% faster)
    codeflash_output = _is_tmp_file("/various/file.txt") # 721ns -> 255ns (183% faster)

def test_edge_case_sensitivity():
    # Should be case-sensitive: "/Tmp" and "/Var" should not match
    codeflash_output = _is_tmp_file("/Tmp/file.txt") # 1.13μs -> 431ns (162% faster)
    codeflash_output = _is_tmp_file("/Var/log.txt") # 617ns -> 287ns (115% faster)

def test_edge_trailing_slash():
    # Should return True for "/tmp/" and "/var/"
    codeflash_output = _is_tmp_file("/tmp/") # 1.01μs -> 396ns (156% faster)
    codeflash_output = _is_tmp_file("/var/") # 690ns -> 253ns (173% faster)

def test_edge_double_slash():
    # Should return False for paths starting with double slash
    codeflash_output = _is_tmp_file("//tmp/file.txt") # 992ns -> 415ns (139% faster)
    codeflash_output = _is_tmp_file("//var/log.txt") # 573ns -> 237ns (142% faster)

def test_edge_windows_paths():
    # Should return False for Windows style paths
    codeflash_output = _is_tmp_file("C:\\tmp\\file.txt") # 999ns -> 366ns (173% faster)
    codeflash_output = _is_tmp_file("C:/tmp/file.txt") # 545ns -> 211ns (158% faster)

def test_edge_hidden_file_in_tmp():
    # Should return True for hidden files in /tmp or /var
    codeflash_output = _is_tmp_file("/tmp/.hidden") # 1.09μs -> 402ns (171% faster)
    codeflash_output = _is_tmp_file("/var/.hidden") # 696ns -> 273ns (155% faster)

def test_edge_path_with_dot_dot():
    # Should return False for paths that do not start with /tmp or /var even if they contain them later
    codeflash_output = _is_tmp_file("/home/user/../tmp/file.txt") # 986ns -> 421ns (134% faster)

def test_edge_path_with_unicode():
    # Should handle unicode paths
    codeflash_output = _is_tmp_file("/tmp/файл.txt") # 1.19μs -> 480ns (148% faster)
    codeflash_output = _is_tmp_file("/var/数据.txt") # 724ns -> 246ns (194% faster)
    codeflash_output = _is_tmp_file("/home/数据.txt") # 522ns -> 188ns (178% faster)

def test_edge_path_with_special_characters():
    # Should handle special characters in path
    codeflash_output = _is_tmp_file("/tmp/!@#$%^&*().txt") # 1.07μs -> 433ns (147% faster)
    codeflash_output = _is_tmp_file("/var/!@#$%^&*().txt") # 697ns -> 290ns (140% faster)

def test_edge_path_with_newline_and_whitespace():
    # Should treat whitespace and newlines as part of the path
    codeflash_output = _is_tmp_file("/tmp/file with spaces.txt") # 1.07μs -> 410ns (161% faster)
    codeflash_output = _is_tmp_file("/var/file\n.txt") # 736ns -> 249ns (196% faster)
    codeflash_output = _is_tmp_file(" /tmp/file.txt") # 547ns -> 194ns (182% faster)

def test_edge_path_with_symlink_like():
    # Should not match symlink-like names unless they start with /tmp or /var
    codeflash_output = _is_tmp_file("/symlink_to_tmp/file.txt") # 1.01μs -> 449ns (124% faster)

def test_edge_path_with_env_var_style():
    # Should not match if path starts with $TMP or $VAR
    codeflash_output = _is_tmp_file("$TMP/file.txt") # 1.08μs -> 402ns (168% faster)
    codeflash_output = _is_tmp_file("$VAR/log.txt") # 594ns -> 234ns (154% faster)

# -------------------------------
# 3. Large Scale Test Cases
# -------------------------------

def test_large_scale_many_tmp_files():
    # Create a list of 1000 files in /tmp and ensure all are detected
    files = [f"/tmp/file_{i}.txt" for i in range(1000)]
    for f in files:
        codeflash_output = _is_tmp_file(f) # 354μs -> 148μs (139% faster)

def test_large_scale_many_var_files():
    # Create a list of 1000 files in /var and ensure all are detected
    files = [f"/var/log_{i}.log" for i in range(1000)]
    for f in files:
        codeflash_output = _is_tmp_file(f) # 413μs -> 153μs (170% faster)

def test_large_scale_many_non_tmp_files():
    # Create a list of 1000 files in /home and ensure none are detected
    files = [f"/home/user/file_{i}.txt" for i in range(1000)]
    for f in files:
        codeflash_output = _is_tmp_file(f) # 373μs -> 150μs (148% faster)


def test_large_scale_long_file_names():
    # Test with very long file names in /tmp and /var
    long_name = "a" * 200
    codeflash_output = _is_tmp_file(f"/tmp/{long_name}.txt") # 1.54μs -> 613ns (151% faster)
    codeflash_output = _is_tmp_file(f"/var/{long_name}.log") # 743ns -> 260ns (186% faster)
    codeflash_output = _is_tmp_file(f"/home/{long_name}.txt") # 512ns -> 185ns (177% faster)

def test_large_scale_various_depths():
    # Test with deep subdirectories under /tmp and /var
    for depth in range(1, 20):
        path = "/tmp" + "/subdir" * depth + "/file.txt"
        codeflash_output = _is_tmp_file(path) # 8.24μs -> 3.55μs (132% faster)
        path = "/var" + "/subdir" * depth + "/file.txt"
        codeflash_output = _is_tmp_file(path)
        path = "/home" + "/subdir" * depth + "/file.txt" # 8.64μs -> 3.27μs (164% faster)
        codeflash_output = _is_tmp_file(path)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

# imports
import pytest
from marimo._server.recents import _is_tmp_file

# function to test
# Copyright 2024 Marimo. All rights reserved.


_IGNORED_FOLDERS = ("/tmp", "/var")
from marimo._server.recents import _is_tmp_file

# unit tests

# --------------------
# Basic Test Cases
# --------------------

def test_tmp_file_in_tmp_folder():
    # File directly in /tmp
    codeflash_output = _is_tmp_file("/tmp/test.txt") # 1.02μs -> 417ns (144% faster)

def test_tmp_file_in_var_folder():
    # File directly in /var
    codeflash_output = _is_tmp_file("/var/log.txt") # 1.24μs -> 457ns (170% faster)

def test_tmp_file_in_tmp_subfolder():
    # File in a subfolder of /tmp
    codeflash_output = _is_tmp_file("/tmp/mydir/file.py") # 1.04μs -> 441ns (136% faster)

def test_tmp_file_in_var_subfolder():
    # File in a subfolder of /var
    codeflash_output = _is_tmp_file("/var/log/mylog.log") # 1.27μs -> 449ns (182% faster)

def test_non_tmp_file_root():
    # File in root, not in ignored folders
    codeflash_output = _is_tmp_file("/home/user/file.txt") # 1.15μs -> 512ns (125% faster)

def test_non_tmp_file_relative():
    # Relative path, not starting with /tmp or /var
    codeflash_output = _is_tmp_file("mydir/file.txt") # 1.14μs -> 480ns (138% faster)

def test_non_tmp_file_other_folder():
    # File in /usr, not in ignored folders
    codeflash_output = _is_tmp_file("/usr/bin/python") # 1.16μs -> 519ns (124% faster)

# --------------------
# Edge Test Cases
# --------------------

def test_empty_string():
    # Empty filename should not match
    codeflash_output = _is_tmp_file("") # 1.09μs -> 493ns (122% faster)

def test_only_slash():
    # Just a slash is not a tmp file
    codeflash_output = _is_tmp_file("/") # 1.08μs -> 449ns (142% faster)

def test_tmp_in_middle_of_path():
    # /tmp is not at the start
    codeflash_output = _is_tmp_file("/home/tmp/file.txt") # 1.14μs -> 470ns (142% faster)

def test_var_in_middle_of_path():
    # /var is not at the start
    codeflash_output = _is_tmp_file("/home/var/file.txt") # 1.06μs -> 438ns (141% faster)

def test_partial_match_tmp():
    # Starts with /tmpX, not exactly /tmp
    codeflash_output = _is_tmp_file("/tmpX/file.txt") # 1.25μs -> 475ns (163% faster)

def test_partial_match_var():
    # Starts with /various, not exactly /var
    codeflash_output = _is_tmp_file("/various/file.txt") # 1.30μs -> 490ns (166% faster)

def test_case_sensitivity():
    # Should be case sensitive
    codeflash_output = _is_tmp_file("/TMP/file.txt") # 1.11μs -> 444ns (150% faster)
    codeflash_output = _is_tmp_file("/VAR/file.txt") # 583ns -> 225ns (159% faster)

def test_tricky_prefix():
    # Path that contains /tmp but not at start
    codeflash_output = _is_tmp_file("/not/tmp/file.txt") # 1.04μs -> 403ns (158% faster)

def test_path_with_double_slash():
    # Double slash at start, not matching /tmp or /var
    codeflash_output = _is_tmp_file("//tmp/file.txt") # 1.07μs -> 352ns (204% faster)

def test_windows_style_path():
    # Windows style path should not match
    codeflash_output = _is_tmp_file("C:\\tmp\\file.txt") # 1.06μs -> 462ns (129% faster)

def test_dot_slash_prefix():
    # ./tmp should not match
    codeflash_output = _is_tmp_file("./tmp/file.txt") # 1.10μs -> 399ns (175% faster)

def test_relative_path_to_tmp():
    # Relative path to tmp
    codeflash_output = _is_tmp_file("tmp/file.txt") # 973ns -> 408ns (138% faster)

def test_only_folder_name():
    # Just the folder name, not starting with /
    codeflash_output = _is_tmp_file("tmp") # 1.07μs -> 444ns (140% faster)
    codeflash_output = _is_tmp_file("var") # 603ns -> 245ns (146% faster)

def test_tmp_file_with_trailing_slash():
    # Path is exactly /tmp/ or /var/
    codeflash_output = _is_tmp_file("/tmp/") # 1.21μs -> 466ns (159% faster)
    codeflash_output = _is_tmp_file("/var/") # 738ns -> 279ns (165% faster)

def test_tmp_file_with_extra_slash():
    # Path is /tmp//file.txt (double slash after /tmp)
    codeflash_output = _is_tmp_file("/tmp//file.txt") # 1.07μs -> 421ns (153% faster)

def test_var_file_with_extra_slash():
    # Path is /var//log.txt (double slash after /var)
    codeflash_output = _is_tmp_file("/var//log.txt") # 1.22μs -> 419ns (191% faster)

def test_file_named_tmp():
    # File named /tmpfile.txt, should not match
    codeflash_output = _is_tmp_file("/tmpfile.txt") # 1.09μs -> 397ns (175% faster)

def test_file_named_var():
    # File named /varfile.txt, should not match
    codeflash_output = _is_tmp_file("/varfile.txt") # 1.25μs -> 451ns (177% faster)

def test_unicode_characters():
    # Unicode in filename, but still starts with /tmp
    codeflash_output = _is_tmp_file("/tmp/файл.txt") # 1.17μs -> 510ns (129% faster)
    # Unicode, but not starting with /tmp or /var
    codeflash_output = _is_tmp_file("/дом/tmp/файл.txt") # 771ns -> 290ns (166% faster)

# --------------------
# Large Scale Test Cases
# --------------------

def test_large_number_of_tmp_files():
    # Test with many files in /tmp
    for i in range(500):
        codeflash_output = _is_tmp_file(f"/tmp/file_{i}.txt") # 178μs -> 75.7μs (136% faster)

def test_large_number_of_var_files():
    # Test with many files in /var
    for i in range(500):
        codeflash_output = _is_tmp_file(f"/var/log_{i}.log") # 208μs -> 78.1μs (166% faster)

def test_large_number_of_non_tmp_files():
    # Test with many files not in /tmp or /var
    for i in range(500):
        codeflash_output = _is_tmp_file(f"/home/user/file_{i}.txt") # 187μs -> 77.1μs (144% faster)

def test_large_mixed_set():
    # Mix of tmp, var, and others
    for i in range(200):
        codeflash_output = _is_tmp_file(f"/tmp/test_{i}.txt") # 74.8μs -> 33.6μs (123% faster)
        codeflash_output = _is_tmp_file(f"/var/test_{i}.txt")
        codeflash_output = _is_tmp_file(f"/opt/test_{i}.txt") # 85.0μs -> 32.1μs (165% faster)
        codeflash_output = _is_tmp_file(f"/usr/test_{i}.txt")

def test_long_pathnames():
    # Very long pathnames starting with /tmp
    base = "/tmp/" + "a" * 900
    codeflash_output = _is_tmp_file(base) # 1.17μs -> 469ns (149% faster)
    # Very long pathnames not starting with /tmp or /var
    base2 = "/home/" + "b" * 900
    codeflash_output = _is_tmp_file(base2) # 657ns -> 273ns (141% faster)

def test_large_number_of_edge_case_paths():
    # Many edge cases in a loop
    for i in range(100):
        # /tmpX should not match
        codeflash_output = _is_tmp_file(f"/tmpX/f{i}") # 37.2μs -> 15.8μs (136% faster)
        # /various should not match
        codeflash_output = _is_tmp_file(f"/various/f{i}")
        # /tmp/ should match
        codeflash_output = _is_tmp_file(f"/tmp/f{i}") # 42.9μs -> 16.1μs (166% faster)
        # /var/ should match
        codeflash_output = _is_tmp_file(f"/var/f{i}")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from marimo._server.recents import _is_tmp_file

def test__is_tmp_file():
    _is_tmp_file('')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_4al8aq2a/tmprksu_t7p/test_concolic_coverage.py::test__is_tmp_file 1.21μs 495ns 145%✅

To edit these changes git checkout codeflash/optimize-_is_tmp_file-mh5qbutt and push.

Codeflash

The optimization replaces the `any()` generator expression with a direct call to `str.startswith()` using a tuple argument. Instead of iterating through each folder in `_IGNORED_FOLDERS` and calling `startswith()` individually, the optimized version passes the entire tuple to `startswith()`, which can check all prefixes in a single operation.

**Key changes:**
- **Eliminated generator expression overhead**: Removed the `any()` function and generator that required Python to iterate through each folder name
- **Single native string operation**: `str.startswith()` with a tuple argument is implemented in C and performs all prefix checks in one call
- **Reduced function call overhead**: Instead of potentially 2 separate `startswith()` calls (one for each folder), there's now just 1

**Why it's faster:**
The original code uses `any()` with a generator that calls `filename.startswith()` for each folder until a match is found. This involves Python bytecode interpretation overhead for each iteration. The optimized version leverages the fact that `str.startswith()` natively accepts a tuple of prefixes and performs all checks efficiently in C code.

**Performance characteristics:**
- **Best case scenarios**: Files starting with `/tmp` see ~180% speedup since the check completes immediately without iteration
- **Consistent gains**: All test cases show 115-200% improvements, indicating the optimization benefits both matching and non-matching cases
- **Scales well**: Large-scale tests with 500-1000 files maintain 136-170% speedups, showing the optimization doesn't degrade with volume

The 149% overall speedup demonstrates that eliminating the Python-level iteration overhead provides substantial performance gains across all input patterns.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 25, 2025 03:37
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant