Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 25, 2025

📄 19% (0.19x) speedup for is_github_src in marimo/_cli/file_path.py

⏱️ Runtime : 43.5 milliseconds 36.6 milliseconds (best of 79 runs)

📝 Explanation and details

The optimization achieves an 18% speedup by eliminating a redundant URL parsing operation in the is_github_src function.

Key optimization: The original code called urllib.parse.urlparse(url) twice - once to get the hostname and again to get the path. The optimized version parses the URL only once and stores the result in a parsed variable, then accesses both .hostname and .path from the cached ParseResult object.

Why this improves performance: URL parsing involves tokenization, validation, and object creation. By avoiding the duplicate parsing, we eliminate approximately 59.7% of the function's runtime overhead (as shown in the line profiler where the second urlparse call was the most expensive operation).

Additional minor improvement: The hostname comparison was changed from hostname != "github.com" and hostname != "raw.githubusercontent.com" to hostname not in ("github.com", "raw.githubusercontent.com"), which is slightly more efficient for the CPU's branch prediction.

Test case benefits: The optimization shows consistent 10-25% improvements across all test cases involving valid URLs, with the largest gains (20-25%) on tests with many valid GitHub URLs where the parsing overhead is most significant. Invalid URL cases see minimal impact since they fail early in the is_url() check.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 31 Passed
🌀 Generated Regression Tests 6593 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
_cli/test_file_path.py::test_is_github_src_with_valid_url 34.8μs 30.2μs 15.1%✅
🌀 Generated Regression Tests and Runtime
import re
# function to test
import urllib.parse

# imports
import pytest  # used for our unit tests
from marimo._cli.file_path import is_github_src

# --- unit tests ---

# Basic Test Cases
def test_github_com_valid_py():
    # Standard github.com URL ending with .py
    codeflash_output = is_github_src("https://github.com/user/repo/file.py", ".py") # 18.8μs -> 16.0μs (17.7% faster)

def test_raw_githubusercontent_valid_ipynb():
    # Standard raw.githubusercontent.com URL ending with .ipynb
    codeflash_output = is_github_src("https://raw.githubusercontent.com/user/repo/main/notebook.ipynb", ".ipynb") # 20.0μs -> 17.5μs (14.9% faster)

def test_github_com_invalid_ext():
    # github.com URL but wrong extension
    codeflash_output = is_github_src("https://github.com/user/repo/file.txt", ".py") # 18.7μs -> 16.3μs (15.0% faster)

def test_raw_githubusercontent_invalid_ext():
    # raw.githubusercontent.com URL but wrong extension
    codeflash_output = is_github_src("https://raw.githubusercontent.com/user/repo/main/notebook.txt", ".ipynb") # 19.6μs -> 17.4μs (12.6% faster)

def test_other_host():
    # Host is not github.com or raw.githubusercontent.com
    codeflash_output = is_github_src("https://gitlab.com/user/repo/file.py", ".py") # 16.4μs -> 15.6μs (5.14% faster)

def test_invalid_url_format():
    # Not a valid URL
    codeflash_output = is_github_src("not_a_url", ".py") # 1.18μs -> 1.25μs (6.37% slower)

def test_missing_protocol():
    # Missing protocol (should fail URL validation)
    codeflash_output = is_github_src("github.com/user/repo/file.py", ".py") # 1.40μs -> 1.46μs (4.65% slower)

def test_ftp_protocol():
    # FTP protocol is valid for is_url, but host must be github.com/raw.githubusercontent.com
    codeflash_output = is_github_src("ftp://github.com/user/repo/file.py", ".py") # 21.1μs -> 17.9μs (18.2% faster)

def test_github_com_uppercase_ext():
    # Extension is case sensitive
    codeflash_output = is_github_src("https://github.com/user/repo/file.PY", ".py") # 19.6μs -> 17.0μs (15.3% faster)

def test_github_com_case_sensitive_path():
    # Path is case sensitive for extension
    codeflash_output = is_github_src("https://github.com/user/repo/FiLe.py", ".py") # 19.2μs -> 17.6μs (9.04% faster)

# Edge Test Cases
def test_github_com_with_port():
    # github.com with port number
    codeflash_output = is_github_src("https://github.com:443/user/repo/file.py", ".py") # 19.6μs -> 17.2μs (13.7% faster)

def test_github_com_with_query():
    # github.com with query string
    codeflash_output = is_github_src("https://github.com/user/repo/file.py?raw=true", ".py") # 19.7μs -> 16.9μs (16.7% faster)

def test_github_com_with_fragment():
    # github.com with fragment
    codeflash_output = is_github_src("https://github.com/user/repo/file.py#L10", ".py") # 19.6μs -> 17.1μs (14.8% faster)

def test_github_com_with_auth():
    # github.com with basic auth
    codeflash_output = is_github_src("https://username:[email protected]/user/repo/file.py", ".py") # 18.6μs -> 16.6μs (11.8% faster)

def test_github_com_with_unicode_path():
    # github.com with unicode in path
    codeflash_output = is_github_src("https://github.com/user/repo/файл.py", ".py") # 21.2μs -> 19.0μs (11.6% faster)

def test_github_com_trailing_slash():
    # github.com with trailing slash after file
    codeflash_output = is_github_src("https://github.com/user/repo/file.py/", ".py") # 19.1μs -> 16.2μs (18.0% faster)

def test_github_com_path_endswith_extension_but_extra():
    # Path ends with .py but has extra after
    codeflash_output = is_github_src("https://github.com/user/repo/file.py.backup", ".py") # 18.7μs -> 16.7μs (12.5% faster)

def test_github_com_path_only_extension():
    # Path is just the extension
    codeflash_output = is_github_src("https://github.com/.py", ".py") # 17.9μs -> 15.9μs (12.3% faster)

def test_github_com_empty_path():
    # Empty path
    codeflash_output = is_github_src("https://github.com/", ".py") # 18.5μs -> 16.2μs (14.0% faster)

def test_github_com_no_path():
    # No path at all
    codeflash_output = is_github_src("https://github.com", ".py") # 17.8μs -> 15.8μs (12.7% faster)

def test_github_com_multiple_dots_in_filename():
    # Multiple dots in filename
    codeflash_output = is_github_src("https://github.com/user/repo/my.file.name.py", ".py") # 18.8μs -> 16.6μs (13.7% faster)

def test_github_com_extension_in_middle_of_path():
    # Extension in middle of path, not at end
    codeflash_output = is_github_src("https://github.com/user/repo/file.py/other", ".py") # 18.3μs -> 15.7μs (16.7% faster)

def test_github_com_path_with_special_chars():
    # Path contains special characters
    codeflash_output = is_github_src("https://github.com/user/repo/file-_.py", ".py") # 18.7μs -> 16.0μs (16.4% faster)

def test_github_com_path_with_spaces_encoded():
    # Path with spaces encoded as %20
    codeflash_output = is_github_src("https://github.com/user/repo/my%20file.py", ".py") # 17.9μs -> 16.4μs (9.64% faster)

def test_github_com_path_with_spaces_unencoded():
    # Path with spaces unencoded (invalid URL)
    codeflash_output = is_github_src("https://github.com/user/repo/my file.py", ".py") # 13.0μs -> 12.8μs (1.67% faster)

def test_github_com_path_with_uppercase_extension():
    # Path with uppercase extension, should not match
    codeflash_output = is_github_src("https://github.com/user/repo/file.PY", ".py") # 15.7μs -> 13.2μs (18.9% faster)

def test_github_com_path_with_dot_in_folder():
    # Dot in folder name, not in file extension
    codeflash_output = is_github_src("https://github.com/user/repo.v1/file.py", ".py") # 18.8μs -> 16.2μs (16.2% faster)

def test_github_com_path_with_multiple_extensions():
    # Multiple extensions, only last matters
    codeflash_output = is_github_src("https://github.com/user/repo/file.txt.py", ".py") # 18.7μs -> 16.7μs (12.4% faster)

def test_github_com_path_with_hidden_file():
    # Hidden file (starts with dot)
    codeflash_output = is_github_src("https://github.com/user/repo/.file.py", ".py") # 18.4μs -> 16.4μs (12.5% faster)

def test_github_com_path_with_long_extension():
    # Long extension
    codeflash_output = is_github_src("https://github.com/user/repo/file.longextension", ".longextension") # 18.3μs -> 16.3μs (12.5% faster)

def test_raw_githubusercontent_path_with_branch_and_subfolder():
    # raw.githubusercontent.com with branch and subfolder
    codeflash_output = is_github_src("https://raw.githubusercontent.com/user/repo/branch/subfolder/file.py", ".py") # 21.4μs -> 19.1μs (12.0% faster)

def test_github_com_path_with_dash_in_filename():
    # Dash in filename
    codeflash_output = is_github_src("https://github.com/user/repo/file-name.py", ".py") # 18.8μs -> 16.2μs (15.6% faster)

def test_github_com_path_with_plus_in_filename():
    # Plus in filename
    codeflash_output = is_github_src("https://github.com/user/repo/file+name.py", ".py") # 18.3μs -> 16.3μs (12.3% faster)

def test_github_com_path_with_at_in_filename():
    # @ in filename
    codeflash_output = is_github_src("https://github.com/user/repo/[email protected]", ".py") # 18.8μs -> 16.4μs (14.6% faster)

def test_github_com_path_with_colon_in_filename():
    # Colon in filename (should be valid URL)
    codeflash_output = is_github_src("https://github.com/user/repo/file:name.py", ".py") # 18.6μs -> 16.5μs (13.0% faster)

def test_github_com_path_with_semicolon_in_filename():
    # Semicolon in filename
    codeflash_output = is_github_src("https://github.com/user/repo/file;name.py", ".py") # 20.3μs -> 17.1μs (18.6% faster)

def test_github_com_path_with_comma_in_filename():
    # Comma in filename
    codeflash_output = is_github_src("https://github.com/user/repo/file,name.py", ".py") # 18.5μs -> 15.8μs (16.8% faster)

def test_github_com_path_with_equal_in_filename():
    # Equal sign in filename
    codeflash_output = is_github_src("https://github.com/user/repo/file=name.py", ".py") # 17.8μs -> 16.5μs (7.94% faster)

def test_github_com_path_with_parentheses_in_filename():
    # Parentheses in filename
    codeflash_output = is_github_src("https://github.com/user/repo/file(name).py", ".py") # 18.2μs -> 16.1μs (12.6% faster)

def test_github_com_path_with_unicode_extension():
    # Unicode extension
    codeflash_output = is_github_src("https://github.com/user/repo/file.π", ".π") # 20.9μs -> 18.5μs (13.0% faster)

def test_github_com_path_with_extension_and_query():
    # Extension followed by query string
    codeflash_output = is_github_src("https://github.com/user/repo/file.py?version=1", ".py") # 20.0μs -> 17.7μs (13.3% faster)

def test_github_com_path_with_extension_and_fragment():
    # Extension followed by fragment
    codeflash_output = is_github_src("https://github.com/user/repo/file.py#section", ".py") # 19.5μs -> 16.9μs (14.9% faster)

# Large Scale Test Cases
def test_large_scale_many_valid_urls():
    # Test many valid URLs in a loop
    for i in range(100):
        url = f"https://github.com/user/repo/file{i}.py"
        codeflash_output = is_github_src(url, ".py") # 678μs -> 547μs (23.9% faster)

def test_large_scale_many_invalid_urls():
    # Test many invalid URLs (wrong host, wrong extension, malformed)
    for i in range(100):
        # Wrong host
        url1 = f"https://notgithub.com/user/repo/file{i}.py"
        codeflash_output = is_github_src(url1, ".py") # 577μs -> 573μs (0.825% faster)
        # Wrong extension
        url2 = f"https://github.com/user/repo/file{i}.txt"
        codeflash_output = is_github_src(url2, ".py")
        # Malformed URL
        url3 = f"not_a_url_{i}" # 671μs -> 537μs (25.1% faster)
        codeflash_output = is_github_src(url3, ".py")

def test_large_scale_mixed_valid_and_invalid():
    # Mix valid and invalid URLs
    valid_urls = [f"https://github.com/user/repo/file{i}.py" for i in range(50)]
    invalid_urls = [f"https://github.com/user/repo/file{i}.txt" for i in range(50)]
    for url in valid_urls:
        codeflash_output = is_github_src(url, ".py") # 353μs -> 285μs (23.9% faster)
    for url in invalid_urls:
        codeflash_output = is_github_src(url, ".py") # 332μs -> 264μs (25.7% faster)

def test_large_scale_long_path():
    # Very long path, but valid
    long_file = "a" * 900 + ".py"
    url = f"https://github.com/user/repo/{long_file}"
    codeflash_output = is_github_src(url, ".py") # 27.4μs -> 24.6μs (11.6% faster)

def test_large_scale_various_extensions():
    # Test many different extensions
    for ext in [".py", ".ipynb", ".txt", ".md", ".csv"]:
        url = f"https://github.com/user/repo/file{ext}"
        codeflash_output = is_github_src(url, ext) # 47.9μs -> 39.5μs (21.2% faster)
        # Should fail for a different extension
        codeflash_output = is_github_src(url, ".wrong")

def test_large_scale_raw_githubusercontent():
    # Many valid raw.githubusercontent.com URLs
    for i in range(100):
        url = f"https://raw.githubusercontent.com/user/repo/main/file{i}.ipynb"
        codeflash_output = is_github_src(url, ".ipynb") # 801μs -> 657μs (21.9% faster)

def test_large_scale_unicode_filenames():
    # Many URLs with unicode filenames
    for i in range(100):
        url = f"https://github.com/user/repo/файл{i}.py"
        codeflash_output = is_github_src(url, ".py") # 724μs -> 579μs (24.9% faster)

def test_large_scale_hidden_files():
    # Many hidden files
    for i in range(100):
        url = f"https://github.com/user/repo/.hidden{i}.py"
        codeflash_output = is_github_src(url, ".py") # 684μs -> 548μs (24.7% faster)

def test_large_scale_mixed_hosts():
    # Mix of valid and invalid hosts
    valid_urls = [f"https://github.com/user/repo/file{i}.py" for i in range(50)]
    invalid_urls = [f"https://gitlab.com/user/repo/file{i}.py" for i in range(50)]
    for url in valid_urls:
        codeflash_output = is_github_src(url, ".py") # 353μs -> 283μs (24.7% faster)
    for url in invalid_urls:
        codeflash_output = is_github_src(url, ".py") # 257μs -> 256μs (0.098% faster)

def test_large_scale_urls_with_query_and_fragment():
    # Many URLs with query and fragment
    for i in range(100):
        url = f"https://github.com/user/repo/file{i}.py?version={i}#frag{i}"
        codeflash_output = is_github_src(url, ".py") # 738μs -> 593μs (24.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import re
import urllib.parse

# imports
import pytest  # used for our unit tests
from marimo._cli.file_path import is_github_src

# --- unit tests ---

# ----------- BASIC TEST CASES -----------
def test_basic_github_com_correct_extension():
    # Should return True for github.com with correct extension
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.py", ".py") # 18.3μs -> 17.5μs (4.58% faster)

def test_basic_raw_githubusercontent_correct_extension():
    # Should return True for raw.githubusercontent.com with correct extension
    codeflash_output = is_github_src("https://raw.githubusercontent.com/user/repo/main/file.py", ".py") # 19.8μs -> 17.7μs (11.9% faster)

def test_basic_github_com_wrong_extension():
    # Should return False for github.com with wrong extension
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.txt", ".py") # 18.1μs -> 16.3μs (11.1% faster)

def test_basic_raw_githubusercontent_wrong_extension():
    # Should return False for raw.githubusercontent.com with wrong extension
    codeflash_output = is_github_src("https://raw.githubusercontent.com/user/repo/main/file.txt", ".py") # 19.3μs -> 17.2μs (12.1% faster)

def test_basic_non_github_url():
    # Should return False for non-GitHub URLs
    codeflash_output = is_github_src("https://gitlab.com/user/repo/blob/main/file.py", ".py") # 16.3μs -> 15.6μs (4.67% faster)

def test_basic_invalid_url():
    # Should return False for invalid URLs
    codeflash_output = is_github_src("not_a_url", ".py") # 1.11μs -> 1.19μs (6.67% slower)

def test_basic_github_com_subdomain():
    # Should return False for subdomains of github.com
    codeflash_output = is_github_src("https://docs.github.com/user/repo/blob/main/file.py", ".py") # 18.4μs -> 17.5μs (4.92% faster)

def test_basic_github_com_with_query():
    # Should return True if path ends with ext, even with query string
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.py?raw=true", ".py") # 20.8μs -> 17.7μs (17.2% faster)

def test_basic_github_com_with_fragment():
    # Should return True if path ends with ext, even with fragment
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.py#L10", ".py") # 19.3μs -> 17.3μs (11.8% faster)

# ----------- EDGE TEST CASES -----------
def test_edge_empty_url():
    # Should return False for empty string
    codeflash_output = is_github_src("", ".py") # 1.16μs -> 1.20μs (3.75% slower)

def test_edge_empty_extension():
    # Should return False for empty extension, even if URL is correct
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.py", "") # 15.3μs -> 13.6μs (12.4% faster)

def test_edge_extension_with_dot():
    # Should return True only if path ends with extension including dot
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.py", ".py") # 14.6μs -> 13.0μs (12.7% faster)
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/filepy", ".py") # 12.6μs -> 11.7μs (7.58% faster)

def test_edge_case_sensitive_extension():
    # Should be case sensitive for extension
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.PY", ".py") # 18.1μs -> 16.0μs (13.7% faster)
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.py", ".PY") # 7.62μs -> 6.87μs (11.0% faster)

def test_edge_path_with_multiple_dots():
    # Should handle filenames with multiple dots correctly
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.test.py", ".py") # 18.1μs -> 15.7μs (15.1% faster)
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.test.txt", ".py") # 10.2μs -> 9.08μs (12.6% faster)

def test_edge_github_com_with_port():
    # Should return True for github.com with port in URL
    codeflash_output = is_github_src("https://github.com:443/user/repo/blob/main/file.py", ".py") # 18.1μs -> 16.0μs (12.7% faster)

def test_edge_github_com_with_auth():
    # Should return True for github.com with user:pass authentication
    codeflash_output = is_github_src("https://user:[email protected]/user/repo/blob/main/file.py", ".py") # 17.5μs -> 16.0μs (9.90% faster)

def test_edge_github_com_with_ipv6():
    # Should return False for IPv6 host, even if path ends with ext
    codeflash_output = is_github_src("https://[::1]/user/repo/blob/main/file.py", ".py") # 30.4μs -> 29.7μs (2.28% faster)

def test_edge_github_com_localhost():
    # Should return False for localhost
    codeflash_output = is_github_src("http://localhost/user/repo/blob/main/file.py", ".py") # 14.4μs -> 14.2μs (1.27% faster)

def test_edge_github_com_ip_address():
    # Should return False for IP addresses
    codeflash_output = is_github_src("https://192.30.255.112/user/repo/blob/main/file.py", ".py") # 16.5μs -> 16.0μs (3.22% faster)

def test_edge_github_com_path_endswith_extension_but_not_file():
    # Should return False if path ends with extension but not as a file (e.g. folder.py/)
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/folder.py/", ".py") # 18.4μs -> 16.6μs (11.0% faster)

def test_edge_github_com_path_endswith_extension_with_query_and_fragment():
    # Should return True if path ends with extension, even with query and fragment
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.py?version=1#section", ".py") # 19.6μs -> 17.7μs (10.4% faster)

def test_edge_github_com_path_is_only_extension():
    # Should return True if path is only the extension (unlikely, but possible)
    codeflash_output = is_github_src("https://github.com/.py", ".py") # 18.2μs -> 16.2μs (12.4% faster)

def test_edge_github_com_path_endswith_extension_and_extra_slash():
    # Should return False if path ends with extension and then a slash
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.py/", ".py") # 19.3μs -> 16.7μs (15.4% faster)

def test_edge_github_com_path_endswith_extension_and_extra_dot():
    # Should return False if path ends with extension and then a dot
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file.py.", ".py") # 18.2μs -> 16.3μs (11.1% faster)

def test_edge_github_com_path_with_unicode():
    # Should handle unicode characters in path
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/файл.py", ".py") # 20.9μs -> 18.6μs (12.4% faster)

def test_edge_github_com_path_with_url_encoded_extension():
    # Should return False if extension is URL encoded
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file%2Epy", ".py") # 18.8μs -> 16.4μs (14.5% faster)

def test_edge_github_com_path_with_url_encoded_characters():
    # Should return True if path ends with extension and contains URL encoded characters
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/file%20name.py", ".py") # 18.5μs -> 16.6μs (11.5% faster)

def test_edge_github_com_path_with_leading_and_trailing_spaces():
    # Should return False if URL has leading/trailing spaces
    codeflash_output = is_github_src(" https://github.com/user/repo/blob/main/file.py ", ".py") # 1.20μs -> 1.35μs (11.2% slower)

def test_edge_github_com_path_with_uppercase_hostname():
    # Should be case sensitive for hostname (should fail for uppercase)
    codeflash_output = is_github_src("https://GITHUB.COM/user/repo/blob/main/file.py", ".py") # 20.0μs -> 17.9μs (11.7% faster)

def test_edge_github_com_path_with_mixed_case_hostname():
    # Should fail for mixed case hostname
    codeflash_output = is_github_src("https://GitHub.com/user/repo/blob/main/file.py", ".py") # 18.9μs -> 16.8μs (12.2% faster)

def test_edge_github_com_path_with_subpath():
    # Should return True if path ends with ext, even with subfolders
    codeflash_output = is_github_src("https://github.com/user/repo/blob/main/subdir/file.py", ".py") # 18.3μs -> 16.8μs (8.47% faster)

# ----------- LARGE SCALE TEST CASES -----------

def test_large_scale_many_github_urls():
    # Test with 1000 valid github.com URLs
    base_url = "https://github.com/user/repo/blob/main/file{}.py"
    for i in range(1000):
        url = base_url.format(i)
        codeflash_output = is_github_src(url, ".py") # 6.70ms -> 5.30ms (26.3% faster)

def test_large_scale_many_raw_githubusercontent_urls():
    # Test with 1000 valid raw.githubusercontent.com URLs
    base_url = "https://raw.githubusercontent.com/user/repo/main/file{}.py"
    for i in range(1000):
        url = base_url.format(i)
        codeflash_output = is_github_src(url, ".py") # 7.67ms -> 6.25ms (22.7% faster)

def test_large_scale_many_non_github_urls():
    # Test with 1000 non-GitHub URLs
    base_url = "https://gitlab.com/user/repo/blob/main/file{}.py"
    for i in range(1000):
        url = base_url.format(i)
        codeflash_output = is_github_src(url, ".py") # 5.26ms -> 5.21ms (0.886% faster)

def test_large_scale_many_wrong_extension_urls():
    # Test with 1000 github.com URLs with wrong extension
    base_url = "https://github.com/user/repo/blob/main/file{}.txt"
    for i in range(1000):
        url = base_url.format(i)
        codeflash_output = is_github_src(url, ".py") # 6.69ms -> 5.30ms (26.2% faster)

def test_large_scale_mixed_valid_and_invalid_urls():
    # Mix of valid and invalid URLs, check correct results
    for i in range(500):
        # Valid github.com URL
        url1 = f"https://github.com/user/repo/blob/main/file{i}.py"
        codeflash_output = is_github_src(url1, ".py") # 3.34ms -> 2.65ms (26.0% faster)
        # Invalid github.com URL (wrong extension)
        url2 = f"https://github.com/user/repo/blob/main/file{i}.txt"
        codeflash_output = is_github_src(url2, ".py")
        # Non-GitHub URL
        url3 = f"https://gitlab.com/user/repo/blob/main/file{i}.py" # 3.35ms -> 2.65ms (26.2% faster)
        codeflash_output = is_github_src(url3, ".py")

def test_large_scale_long_path_url():
    # Test with a very long path (under 1000 chars)
    long_filename = "a" * 950 + ".py"
    url = f"https://github.com/user/repo/blob/main/{long_filename}"
    codeflash_output = is_github_src(url, ".py") # 32.2μs -> 29.3μs (9.77% faster)

def test_large_scale_long_extension():
    # Test with a long extension
    ext = "." + "a" * 900
    url = f"https://github.com/user/repo/blob/main/file{ext}"
    codeflash_output = is_github_src(url, ext) # 28.0μs -> 25.5μs (9.53% faster)
    # Should fail for shorter extension
    codeflash_output = is_github_src(url, ".a") # 16.9μs -> 15.5μs (8.73% faster)

def test_large_scale_url_with_many_query_params():
    # Test with many query parameters
    params = "&".join([f"param{i}=val{i}" for i in range(50)])
    url = f"https://github.com/user/repo/blob/main/file.py?{params}"
    codeflash_output = is_github_src(url, ".py") # 22.9μs -> 19.4μs (17.9% faster)

def test_large_scale_url_with_many_fragments():
    # Test with many fragments (should only be one fragment, but test robustness)
    url = "https://github.com/user/repo/blob/main/file.py" + "".join([f"#frag{i}" for i in range(50)])
    codeflash_output = is_github_src(url, ".py") # 20.6μs -> 18.6μs (10.5% faster)

def test_large_scale_url_with_many_subdirs():
    # Test with deep subdirectories
    subdirs = "/".join([f"subdir{i}" for i in range(50)])
    url = f"https://github.com/user/repo/blob/main/{subdirs}/file.py"
    codeflash_output = is_github_src(url, ".py") # 23.1μs -> 20.9μs (10.4% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from marimo._cli.file_path import is_github_src

def test_is_github_src():
    is_github_src('', '')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_4al8aq2a/tmpgc4f19hq/test_concolic_coverage.py::test_is_github_src 1.07μs 1.11μs -3.78%⚠️

To edit these changes git checkout codeflash/optimize-is_github_src-mh5ryng9 and push.

Codeflash

The optimization achieves an **18% speedup** by eliminating a redundant URL parsing operation in the `is_github_src` function.

**Key optimization**: The original code called `urllib.parse.urlparse(url)` twice - once to get the hostname and again to get the path. The optimized version parses the URL only once and stores the result in a `parsed` variable, then accesses both `.hostname` and `.path` from the cached ParseResult object.

**Why this improves performance**: URL parsing involves tokenization, validation, and object creation. By avoiding the duplicate parsing, we eliminate approximately 59.7% of the function's runtime overhead (as shown in the line profiler where the second `urlparse` call was the most expensive operation).

**Additional minor improvement**: The hostname comparison was changed from `hostname != "github.com" and hostname != "raw.githubusercontent.com"` to `hostname not in ("github.com", "raw.githubusercontent.com")`, which is slightly more efficient for the CPU's branch prediction.

**Test case benefits**: The optimization shows consistent 10-25% improvements across all test cases involving valid URLs, with the largest gains (20-25%) on tests with many valid GitHub URLs where the parsing overhead is most significant. Invalid URL cases see minimal impact since they fail early in the `is_url()` check.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 25, 2025 04:23
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant