Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 25, 2025

📄 71% (0.71x) speedup for _is_code_tag in marimo/_convert/markdown/markdown.py

⏱️ Runtime : 15.3 milliseconds 8.95 milliseconds (best of 129 runs)

📝 Explanation and details

Explanation of Optimizations:

  1. Optimize Regular Expressions in _is_code_tag:

    • Precompiled regex patterns for code tag detection are now stored at module level. This avoids recompiling regexes on every function call, resulting in faster execution (especially relevant given frequent calls and profiling data).
  2. Short-circuit logic in _is_code_tag:

    • If the legacy format matches, the function immediately returns True—this avoids an unnecessary check for the supported format (which is only needed if legacy_match is False and the dependency is available). This both speeds up function logic and clarifies intent.
  3. No change to has_required_version logic in dependencies.py:

    • No significant optimization opportunity present outside possible method inlining (behavior must be preserved, including the dispatch).

Overall, the largest runtime improvements come from avoiding repeatedly compiling and matching regexes inside the hot _is_code_tag function.


Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 180 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 85.7%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import re
from dataclasses import dataclass

# imports
import pytest  # used for our unit tests
from marimo._convert.markdown.markdown import _is_code_tag

# function to test
# Copyright 2024 Marimo. All rights reserved.



# --- Minimal stubs to allow the function to run in isolation for testing ---

class DummyNewSuperfences:
    def __init__(self, has_required_version_return: bool):
        self._has_required_version_return = has_required_version_return

    def has_required_version(self, quiet: bool = False) -> bool:
        return self._has_required_version_return

# We'll monkeypatch this in tests
class DependencyManager:
    new_superfences = DummyNewSuperfences(True)
from marimo._convert.markdown.markdown import _is_code_tag


def set_dependency_manager(has_required_version_return: bool, monkeypatch):
    """
    Helper to monkeypatch DependencyManager.new_superfences for a test.
    """
    DependencyManager.new_superfences = DummyNewSuperfences(has_required_version_return)

# --- Unit tests ---

# ========== 1. Basic Test Cases ==========

@pytest.mark.parametrize(
    "text,has_required_version,expected",
    [
        # Simple python legacy code tag
        ("```{python}", False, True),
        ("```{python}", True, True),
        # Simple sql legacy code tag
        ("```{sql}", False, True),
        ("```{sql}", True, True),
        # marimo code tag only works if has_required_version==True
        ("```{marimo}", False, False),
        ("```{marimo}", True, True),
        # python with extra spaces
        ("```{  python  }", False, True),
        ("```{  python  }", True, True),
        # marimo with extra spaces
        ("```{  marimo  }", False, False),
        ("```{  marimo  }", True, True),
        # python with language info
        ("```{python .input}", False, True),
        ("```{python .input}", True, True),
        # marimo with language info
        ("```{marimo .input}", False, False),
        ("```{marimo .input}", True, True),
        # python in middle of braces
        ("```{foo python bar}", False, True),
        ("```{foo python bar}", True, True),
        # marimo in middle of braces
        ("```{foo marimo bar}", False, False),
        ("```{foo marimo bar}", True, True),
        # Not a code tag
        ("hello world", False, False),
        ("hello world", True, False),
    ]
)
def test_basic_code_tag_cases(text, has_required_version, expected, monkeypatch):
    """
    Test basic code tag detection for legacy and supported formats.
    """
    set_dependency_manager(has_required_version, monkeypatch)
    codeflash_output = _is_code_tag(text) # 1.80ms -> 972μs (84.8% faster)

# ========== 2. Edge Test Cases ==========

@pytest.mark.parametrize(
    "text,has_required_version,expected",
    [
        # Empty string
        ("", False, False),
        ("", True, False),
        # Only whitespace
        ("   \n   ", False, False),
        ("   \n   ", True, False),
        # Only code fence, no braces
        ("```", False, False),
        ("```", True, False),
        # Braces but no language
        ("```{}", False, False),
        ("```{}", True, False),
        # Only braces, no code fence
        ("{python}", False, True),
        ("{python}", True, True),
        ("{marimo}", False, False),
        ("{marimo}", True, True),
        # Mixed case (should be case sensitive, so not detected)
        ("```{Python}", False, False),
        ("```{Marimo}", True, False),
        # Language as a substring (should match)
        ("```{superpython}", False, True),
        ("```{supermarimo}", True, True),
        # Language as part of another word (should match)
        ("```{notpython}", False, True),
        ("```{notmarimo}", True, True),
        # Newline after code fence
        ("```{python}\nprint(1)", False, True),
        ("```{marimo}\nprint(1)", True, True),
        # Code tag not at start
        ("Some text\n```{python}", False, False),
        ("Some text\n```{marimo}", True, False),
        # Multiple lines, code tag on first line
        ("```{python}\nprint('hello')\n```", False, True),
        ("```{marimo}\nprint('hello')\n```", True, True),
        # Multiple lines, code tag not on first line
        ("notatag\n```{python}", False, False),
        ("notatag\n```{marimo}", True, False),
        # Braces in the middle of the line
        ("foo {python} bar", False, True),
        ("foo {marimo} bar", True, True),
        # Braces with special characters
        ("```{py!@#thon}", False, True),
        ("```{mar!@#imo}", True, True),
        # Braces with numbers
        ("```{python3}", False, True),
        ("```{marimo123}", True, True),
        # Braces with empty string inside
        ("```{}", False, False),
        ("```{}", True, False),
        # Braces with only whitespace inside
        ("```{   }", False, False),
        ("```{   }", True, False),
        # Braces with only symbols
        ("```{!!!}", False, False),
        ("```{!!!}", True, False),
    ]
)
def test_edge_code_tag_cases(text, has_required_version, expected, monkeypatch):
    """
    Test edge cases for code tag detection.
    """
    set_dependency_manager(has_required_version, monkeypatch)
    codeflash_output = _is_code_tag(text) # 3.43ms -> 2.76ms (24.4% faster)

# ========== 3. Large Scale Test Cases ==========

def test_large_scale_many_code_tags(monkeypatch):
    """
    Test performance and correctness with a large number of code tags.
    """
    set_dependency_manager(True, monkeypatch)
    # 500 valid marimo code tags, each on a new line
    lines = ["```{marimo}"] * 500
    text = "\n".join(lines)
    # Only the first line is checked
    codeflash_output = _is_code_tag(text) # 108μs -> 105μs (3.56% faster)

    set_dependency_manager(False, monkeypatch)
    # 500 valid python code tags, each on a new line
    lines = ["```{python}"] * 500
    text = "\n".join(lines)
    codeflash_output = _is_code_tag(text) # 66.1μs -> 8.21μs (705% faster)

    # 500 invalid tags
    lines = ["```{notacode}"] * 500
    text = "\n".join(lines)
    codeflash_output = _is_code_tag(text) # 61.5μs -> 65.9μs (6.73% slower)

def test_large_scale_long_line(monkeypatch):
    """
    Test with a single very long line containing a valid code tag.
    """
    set_dependency_manager(True, monkeypatch)
    # 900 'a's, then a valid marimo code tag, then 90 'b's
    text = "a" * 900 + "```{marimo}" + "b" * 90
    # Since the code tag is not at the start of the line, it should not match
    codeflash_output = _is_code_tag(text) # 86.2μs -> 86.6μs (0.384% slower)

    # Now, put the code tag at the start
    text = "```{marimo}" + "a" * 990
    codeflash_output = _is_code_tag(text) # 60.4μs -> 59.9μs (0.853% faster)

def test_large_scale_whitespace(monkeypatch):
    """
    Test with a large amount of whitespace before and after the code tag.
    """
    set_dependency_manager(True, monkeypatch)
    text = " " * 500 + "```{marimo}" + " " * 400
    # After stripping, the code tag is at the start
    codeflash_output = _is_code_tag(text) # 83.3μs -> 80.8μs (3.09% faster)

def test_large_scale_multiline(monkeypatch):
    """
    Test with a large multiline string, code tag only on the first line.
    """
    set_dependency_manager(True, monkeypatch)
    first_line = "```{marimo}"
    rest = "\n".join(["print(1)"] * 999)
    text = f"{first_line}\n{rest}"
    codeflash_output = _is_code_tag(text) # 112μs -> 111μs (0.529% faster)

    # Now, code tag on second line only
    text = f"notatag\n```{{marimo}}\n{rest}"
    codeflash_output = _is_code_tag(text) # 69.5μs -> 66.6μs (4.31% faster)

def test_large_scale_mixed_tags(monkeypatch):
    """
    Test with a mixture of valid and invalid tags in a large text.
    """
    set_dependency_manager(True, monkeypatch)
    lines = ["notatag"] * 500 + ["```{marimo}"] + ["notatag"] * 499
    text = "\n".join(lines)
    # Only the first line is checked, so should be False
    codeflash_output = _is_code_tag(text) # 97.0μs -> 95.8μs (1.26% faster)

    # Now, put the valid tag at the first line
    lines = ["```{marimo}"] + ["notatag"] * 999
    text = "\n".join(lines)
    codeflash_output = _is_code_tag(text) # 69.2μs -> 67.2μs (3.03% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re

# imports
import pytest  # used for our unit tests
from marimo._convert.markdown.markdown import _is_code_tag


# Mocks for DependencyManager and its attributes
class MockNewSuperfences:
    def __init__(self, has_version: bool):
        self._has_version = has_version

    def has_required_version(self, quiet: bool = False) -> bool:
        return self._has_version

class MockDependencyManager:
    def __init__(self, has_version: bool):
        self.new_superfences = MockNewSuperfences(has_version)

# Patch context manager for DependencyManager
class PatchDependencyManager:
    def __init__(self, has_version: bool):
        self.has_version = has_version
        self.original = None

    def __enter__(self):
        import sys
        global DependencyManager
        self.original = sys.modules.get("marimo._dependencies.dependencies", None)
        # Patch the global DependencyManager
        globals()["DependencyManager"] = MockDependencyManager(self.has_version)

    def __exit__(self, exc_type, exc_val, exc_tb):
        # Restore the original DependencyManager if needed
        pass
from marimo._convert.markdown.markdown import _is_code_tag

# ---------------------- UNIT TESTS ----------------------

# ----------- BASIC TEST CASES ------------

@pytest.mark.parametrize("has_version", [True, False])
def test_basic_python_tag(has_version):
    # Standard python tag
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("{python}") # 174μs -> 3.34μs (5129% faster)
        codeflash_output = _is_code_tag("   {python}   ")
        codeflash_output = _is_code_tag("{python}\nprint('hello')") # 110μs -> 1.69μs (6446% faster)

@pytest.mark.parametrize("has_version", [True, False])
def test_basic_sql_tag(has_version):
    # Standard sql tag
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("{sql}") # 166μs -> 3.26μs (5005% faster)
        codeflash_output = _is_code_tag("  {sql}  ")
        codeflash_output = _is_code_tag("{sql}\nSELECT * FROM table") # 109μs -> 1.74μs (6156% faster)

@pytest.mark.parametrize("has_version", [True, False])
def test_basic_non_code_tag(has_version):
    # Not a code tag
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("plain text") # 165μs -> 190μs (13.2% slower)
        codeflash_output = _is_code_tag("python")
        codeflash_output = _is_code_tag("sql") # 107μs -> 110μs (2.58% slower)
        codeflash_output = _is_code_tag("")

@pytest.mark.parametrize("has_version", [True, False])
def test_basic_marimo_tag(has_version):
    # marimo tag only works if has_version is True
    with PatchDependencyManager(has_version):
        if has_version:
            codeflash_output = _is_code_tag("{marimo}")
            codeflash_output = _is_code_tag("   {marimo}   ")
            codeflash_output = _is_code_tag("{marimo}\nprint('hi')")
        else:
            codeflash_output = _is_code_tag("{marimo}")
            codeflash_output = _is_code_tag("   {marimo}   ")
            codeflash_output = _is_code_tag("{marimo}\nprint('hi')")

# ----------- EDGE TEST CASES ------------

@pytest.mark.parametrize("has_version", [True, False])
def test_edge_case_empty_string(has_version):
    # Empty string should not match
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("") # 165μs -> 161μs (2.13% faster)

@pytest.mark.parametrize("has_version", [True, False])
def test_edge_case_whitespace_only(has_version):
    # Only whitespace should not match
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("   \n  ") # 176μs -> 168μs (5.14% faster)

@pytest.mark.parametrize("has_version", [True, False])
def test_edge_case_malformed_tags(has_version):
    # Malformed tags should not match
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("{pythn}") # 169μs -> 168μs (0.882% faster)
        codeflash_output = _is_code_tag("{sqll}")
        codeflash_output = _is_code_tag("{pythonsql}") # 108μs -> 2.45μs (4338% faster)
        codeflash_output = _is_code_tag("{python sql}")
        codeflash_output = _is_code_tag("{python}") # 104μs -> 1.44μs (7157% faster)
        codeflash_output = _is_code_tag("{sql}")
        if has_version:
            codeflash_output = _is_code_tag("{marimo}") # 100μs -> 972ns (10276% faster)
            codeflash_output = _is_code_tag("{marimo python}") # 100μs -> 972ns (10276% faster)
        else:
            codeflash_output = _is_code_tag("{marimo}")
            codeflash_output = _is_code_tag("{marimo python}")  # python wins

@pytest.mark.parametrize("has_version", [True, False])
def test_edge_case_multiline_first_line_tag(has_version):
    # Tag only on first line
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("{python}\nsecond line") # 166μs -> 3.28μs (4957% faster)
        codeflash_output = _is_code_tag("{sql}\nsecond line")
        if has_version:
            codeflash_output = _is_code_tag("{marimo}\nsecond line") # 107μs -> 1.85μs (5735% faster)
        else:
            codeflash_output = _is_code_tag("{marimo}\nsecond line")

@pytest.mark.parametrize("has_version", [True, False])
def test_edge_case_multiline_tag_not_first_line(has_version):
    # Tag not on first line
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("first line\n{python}") # 167μs -> 165μs (0.998% faster)
        codeflash_output = _is_code_tag("first line\n{sql}")
        codeflash_output = _is_code_tag("first line\n{marimo}") # 108μs -> 106μs (1.90% faster)

@pytest.mark.parametrize("has_version", [True, False])
def test_edge_case_tag_with_extra_characters(has_version):
    # Tag with extra characters
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("code {python}") # 169μs -> 3.37μs (4928% faster)
        codeflash_output = _is_code_tag("code {sql}")
        if has_version:
            codeflash_output = _is_code_tag("code {marimo}") # 110μs -> 1.88μs (5796% faster)
        else:
            codeflash_output = _is_code_tag("code {marimo}")

@pytest.mark.parametrize("has_version", [True, False])
def test_edge_case_tag_with_case_sensitivity(has_version):
    # Tags are case sensitive
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("{PYTHON}") # 167μs -> 166μs (0.284% faster)
        codeflash_output = _is_code_tag("{Sql}")
        codeflash_output = _is_code_tag("{Marimo}") # 106μs -> 107μs (0.098% slower)

@pytest.mark.parametrize("has_version", [True, False])
def test_edge_case_tag_with_brackets(has_version):
    # Brackets but no tag
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("{}") # 164μs -> 162μs (1.61% faster)
        codeflash_output = _is_code_tag("{ }")
        codeflash_output = _is_code_tag("{pythonsql}") # 108μs -> 105μs (2.62% faster)

@pytest.mark.parametrize("has_version", [True, False])
def test_edge_case_tag_with_multiple_tags(has_version):
    # Multiple tags in one line
    with PatchDependencyManager(has_version):
        codeflash_output = _is_code_tag("{python}{sql}") # 167μs -> 3.43μs (4797% faster)
        codeflash_output = _is_code_tag("{sql}{python}")
        if has_version:
            codeflash_output = _is_code_tag("{marimo}{python}") # 110μs -> 1.24μs (8858% faster)
            codeflash_output = _is_code_tag("{marimo}{sql}") # 110μs -> 1.24μs (8858% faster)
            codeflash_output = _is_code_tag("{marimo}{marimo}") # 110μs -> 1.24μs (8858% faster)
        else:
            codeflash_output = _is_code_tag("{marimo}{python}")
            codeflash_output = _is_code_tag("{marimo}{sql}")
            codeflash_output = _is_code_tag("{marimo}{marimo}")

# ----------- LARGE SCALE TEST CASES ------------

@pytest.mark.parametrize("has_version", [True, False])
def test_large_scale_many_lines(has_version):
    # Large multiline input, tag on first line
    with PatchDependencyManager(has_version):
        text = "{python}\n" + "\n".join([f"line {i}" for i in range(999)])
        codeflash_output = _is_code_tag(text) # 200μs -> 27.6μs (627% faster)

        text = "{sql}\n" + "\n".join([f"line {i}" for i in range(999)])
        codeflash_output = _is_code_tag(text)

        if has_version:
            text = "{marimo}\n" + "\n".join([f"line {i}" for i in range(999)]) # 134μs -> 24.3μs (455% faster)
            codeflash_output = _is_code_tag(text) # 134μs -> 24.3μs (455% faster)
        else:
            text = "{marimo}\n" + "\n".join([f"line {i}" for i in range(999)])
            codeflash_output = _is_code_tag(text)

@pytest.mark.parametrize("has_version", [True, False])
def test_large_scale_tag_with_large_head(has_version):
    # Large head line, tag at the end
    with PatchDependencyManager(has_version):
        head = "x" * 990 + "{python}"
        codeflash_output = _is_code_tag(head) # 1.13ms -> 4.26μs (26334% faster)
        head = "x" * 990 + "{sql}"
        codeflash_output = _is_code_tag(head)
        if has_version:
            head = "x" * 990 + "{marimo}" # 1.06ms -> 3.52μs (30048% faster)
            codeflash_output = _is_code_tag(head) # 1.06ms -> 3.52μs (30048% faster)
        else:
            head = "x" * 990 + "{marimo}"
            codeflash_output = _is_code_tag(head)

@pytest.mark.parametrize("has_version", [True, False])
def test_large_scale_many_non_tags(has_version):
    # Many lines, none are tags
    with PatchDependencyManager(has_version):
        text = "\n".join([f"notatag{i}" for i in range(999)])
        codeflash_output = _is_code_tag(text) # 202μs -> 204μs (1.19% slower)

@pytest.mark.parametrize("has_version", [True, False])
def test_large_scale_first_line_non_tag_rest_tags(has_version):
    # Tag not on first line, but elsewhere
    with PatchDependencyManager(has_version):
        text = "notatag\n" + "\n".join(["{python}"] * 999)
        codeflash_output = _is_code_tag(text) # 204μs -> 197μs (3.56% faster)
        text = "notatag\n" + "\n".join(["{sql}"] * 999)
        codeflash_output = _is_code_tag(text)
        text = "notatag\n" + "\n".join(["{marimo}"] * 999) # 134μs -> 131μs (1.81% faster)
        codeflash_output = _is_code_tag(text)

@pytest.mark.parametrize("has_version", [True, False])
def test_large_scale_first_line_tag_rest_non_tags(has_version):
    # Tag on first line, rest are not tags
    with PatchDependencyManager(has_version):
        text = "{python}\n" + "\n".join(["notatag"] * 999)
        codeflash_output = _is_code_tag(text) # 198μs -> 27.4μs (624% faster)
        text = "{sql}\n" + "\n".join(["notatag"] * 999)
        codeflash_output = _is_code_tag(text)
        if has_version:
            text = "{marimo}\n" + "\n".join(["notatag"] * 999) # 133μs -> 24.0μs (457% faster)
            codeflash_output = _is_code_tag(text) # 133μs -> 24.0μs (457% faster)
        else:
            text = "{marimo}\n" + "\n".join(["notatag"] * 999)
            codeflash_output = _is_code_tag(text)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from marimo._convert.markdown.markdown import _is_code_tag

def test__is_code_tag():
    _is_code_tag('')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_qdbuqf4z/tmpzaiyoc6b/test_concolic_coverage.py::test__is_code_tag 83.1μs 81.7μs 1.80%✅

To edit these changes git checkout codeflash/optimize-_is_code_tag-mh5l3ubr and push.

Codeflash

**Explanation of Optimizations:**

1. **Optimize Regular Expressions in `_is_code_tag`:**
   - Precompiled regex patterns for code tag detection are now stored at module level. This avoids recompiling regexes on every function call, resulting in faster execution (especially relevant given frequent calls and profiling data).
   
2. **Short-circuit logic in `_is_code_tag`:**
   - If the legacy format matches, the function immediately returns `True`—this avoids an unnecessary check for the supported format (which is only needed if legacy_match is `False` and the dependency is available). This both speeds up function logic and clarifies intent.

3. **No change to `has_required_version` logic in dependencies.py:**
   - No significant optimization opportunity present outside possible method inlining (behavior must be preserved, including the dispatch).

Overall, the largest runtime improvements come from avoiding repeatedly compiling and matching regexes inside the hot `_is_code_tag` function.

---
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 25, 2025 01:11
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant