Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 27, 2025

📄 26% (0.26x) speedup for EndpointUtils.gather_params in pinecone/openapi_support/endpoint_utils.py

⏱️ Runtime : 388 microseconds 307 microseconds (best of 80 runs)

📝 Explanation and details

The optimization achieves a 26% speedup by reducing dictionary lookup overhead and method call costs in the critical loop path. Here are the key optimizations:

1. Method Binding Optimization

  • Pre-binds frequently called methods like location_map.get() and attribute_map.__getitem__() to local variables, eliminating repeated attribute lookups during the loop iteration.

2. String Constant Caching

  • Stores location strings ("body", "form", "query", etc.) as local variables, avoiding repeated string literal creation and comparison overhead.

3. Dictionary Reference Pre-binding

  • Pre-extracts references to nested dictionary values (e.g., params["form"], params["query"]) before the loop, reducing repeated key lookups in the hot path.

4. Identity vs Equality Optimization

  • Uses is instead of == for tuple comparisons with file_type_tuple, which is faster for singleton object identity checks.

5. Restructured Control Flow

  • Consolidates the form parameter handling logic to reduce repeated openapi_types lookups and streamlines the conditional branches.

Performance Impact by Test Case:

  • Large-scale scenarios benefit most: 27-51% faster on tests with 100+ parameters, where the loop overhead dominates
  • Small cases show modest regression: 3-37% slower on simple cases due to initialization overhead of local bindings
  • Sweet spot: Medium to large parameter sets where the loop iterations amortize the setup cost

This optimization is particularly effective for API endpoints that process many parameters simultaneously, which is common in OpenAPI/REST frameworks.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 16 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 93.1%
🌀 Generated Regression Tests and Runtime
from typing import Any, Dict, Tuple

# imports
import pytest
from pinecone.openapi_support.endpoint_utils import EndpointUtils


# --- Mocked dependencies for isolated testing ---
class PineconeApiTypeError(Exception):
    pass

def file_type():
    """Dummy file_type for type comparison."""
    pass


AttributeMapDictType = Dict[str, str]
LocationMapDictType = Dict[str, str]
OpenapiTypesDictType = Dict[str, Tuple]

# Output type for gather_params
CombinedParamsMapDict = Dict[str, Any]
from pinecone.openapi_support.endpoint_utils import EndpointUtils


# --- Unit tests ---
# Helper for the dummy file_type comparison
def _file_type_tuple():
    return (file_type,)

def _file_type_list_tuple():
    return ([file_type],)

# 1. Basic Test Cases

def test_gather_params_basic_body():
    """Test single body parameter."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'foo': 'foo'},
        location_map={'foo': 'body'},
        openapi_types={'foo': (str,)},
        collection_format_map={},
        kwargs={'foo': 'bar'}
    ); result = codeflash_output # 2.20μs -> 3.08μs (28.6% slower)

def test_gather_params_basic_query_and_header():
    """Test query and header parameters."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'q': 'q', 'h': 'X-Header'},
        location_map={'q': 'query', 'h': 'header'},
        openapi_types={'q': (str,), 'h': (str,)},
        collection_format_map={},
        kwargs={'q': 'search', 'h': 'value'}
    ); result = codeflash_output # 2.66μs -> 3.51μs (24.2% slower)

def test_gather_params_basic_form_and_path():
    """Test form and path parameters."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'f': 'form_field', 'p': 'path_param'},
        location_map={'f': 'form', 'p': 'path'},
        openapi_types={'f': (str,), 'p': (int,)},
        collection_format_map={},
        kwargs={'f': 'fval', 'p': 123}
    ); result = codeflash_output # 3.31μs -> 3.51μs (5.75% slower)

def test_gather_params_basic_collection_format():
    """Test collection_format is set when present."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'q': 'q'},
        location_map={'q': 'query'},
        openapi_types={'q': (list,)},
        collection_format_map={'q': 'csv'},
        kwargs={'q': [1, 2, 3]}
    ); result = codeflash_output # 2.23μs -> 2.93μs (23.9% slower)

def test_gather_params_basic_file_form():
    """Test file upload via form with single file_type."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'f': 'file_field'},
        location_map={'f': 'form'},
        openapi_types={'f': _file_type_tuple()},
        collection_format_map={},
        kwargs={'f': 'filedata'}
    ); result = codeflash_output # 2.54μs -> 2.94μs (13.5% slower)

def test_gather_params_basic_file_form_list():
    """Test file upload via form with a list of file_type."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'f': 'file_field'},
        location_map={'f': 'form'},
        openapi_types={'f': _file_type_list_tuple()},
        collection_format_map={},
        kwargs={'f': ['filedata1', 'filedata2']}
    ); result = codeflash_output # 2.72μs -> 2.81μs (3.17% slower)

# 2. Edge Test Cases

def test_gather_params_missing_location_map_entry():
    """Parameter in kwargs not present in location_map should be ignored."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'a': 'a'},
        location_map={'a': 'query'},
        openapi_types={'a': (str,)},
        collection_format_map={},
        kwargs={'a': 'aval', 'b': 'bval'}
    ); result = codeflash_output # 2.29μs -> 3.06μs (25.0% slower)


def test_gather_params_empty_kwargs():
    """Empty kwargs should result in all outputs empty/default."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={},
        location_map={},
        openapi_types={},
        collection_format_map={},
        kwargs={}
    ); result = codeflash_output # 1.79μs -> 2.85μs (37.2% slower)

def test_gather_params_collection_format_for_nonexistent_param():
    """Collection format for param not in kwargs should not appear in result."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'a': 'a'},
        location_map={'a': 'query'},
        openapi_types={'a': (str,)},
        collection_format_map={'b': 'csv'},  # 'b' not in kwargs
        kwargs={'a': 'aval'}
    ); result = codeflash_output # 2.28μs -> 3.06μs (25.2% slower)

def test_gather_params_form_and_file_mix():
    """Mix of form fields and file uploads."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'f1': 'file1', 'f2': 'form2', 'f3': 'file3'},
        location_map={'f1': 'form', 'f2': 'form', 'f3': 'form'},
        openapi_types={'f1': _file_type_tuple(), 'f2': (str,), 'f3': _file_type_list_tuple()},
        collection_format_map={},
        kwargs={'f1': 'filedata', 'f2': 'formval', 'f3': ['fd1', 'fd2']}
    ); result = codeflash_output # 4.00μs -> 3.77μs (6.15% faster)

def test_gather_params_multiple_same_location():
    """Multiple parameters in the same location (query, form, header, path)."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'},
        location_map={'a': 'query', 'b': 'form', 'c': 'header', 'd': 'path'},
        openapi_types={'a': (str,), 'b': (str,), 'c': (str,), 'd': (int,)},
        collection_format_map={},
        kwargs={'a': 'qval', 'b': 'fval', 'c': 'hval', 'd': 42}
    ); result = codeflash_output # 3.88μs -> 4.00μs (3.15% slower)

def test_gather_params_collection_format_multiple():
    """Multiple collection_format entries for multiple params."""
    codeflash_output = EndpointUtils.gather_params(
        attribute_map={'a': 'A', 'b': 'B'},
        location_map={'a': 'query', 'b': 'query'},
        openapi_types={'a': (list,), 'b': (list,)},
        collection_format_map={'a': 'csv', 'b': 'multi'},
        kwargs={'a': [1, 2], 'b': [3, 4]}
    ); result = codeflash_output # 2.68μs -> 3.36μs (20.1% slower)

# 3. Large Scale Test Cases

def test_gather_params_large_number_of_query_params():
    """Test with a large number of query params."""
    n = 500
    attribute_map = {f'param{i}': f'param{i}' for i in range(n)}
    location_map = {f'param{i}': 'query' for i in range(n)}
    openapi_types = {f'param{i}': (str,) for i in range(n)}
    kwargs = {f'param{i}': f'value{i}' for i in range(n)}
    codeflash_output = EndpointUtils.gather_params(
        attribute_map=attribute_map,
        location_map=location_map,
        openapi_types=openapi_types,
        collection_format_map={},
        kwargs=kwargs
    ); result = codeflash_output # 98.4μs -> 77.4μs (27.1% faster)
    for i in range(n):
        pass

def test_gather_params_large_mixed_params():
    """Test with a large mix of param locations."""
    n = 200
    attribute_map = {f'q{i}': f'q{i}' for i in range(n)}
    attribute_map.update({f'f{i}': f'f{i}' for i in range(n)})
    attribute_map.update({f'h{i}': f'h{i}' for i in range(n)})
    location_map = {f'q{i}': 'query' for i in range(n)}
    location_map.update({f'f{i}': 'form' for i in range(n)})
    location_map.update({f'h{i}': 'header' for i in range(n)})
    openapi_types = {f'q{i}': (str,) for i in range(n)}
    openapi_types.update({f'f{i}': (str,) for i in range(n)})
    openapi_types.update({f'h{i}': (str,) for i in range(n)})
    kwargs = {f'q{i}': f'qval{i}' for i in range(n)}
    kwargs.update({f'f{i}': f'fval{i}' for i in range(n)})
    kwargs.update({f'h{i}': f'hval{i}' for i in range(n)})
    codeflash_output = EndpointUtils.gather_params(
        attribute_map=attribute_map,
        location_map=location_map,
        openapi_types=openapi_types,
        collection_format_map={},
        kwargs=kwargs
    ); result = codeflash_output # 145μs -> 102μs (41.5% faster)
    # Spot check a few values
    for i in [0, n//2, n-1]:
        pass

def test_gather_params_large_file_uploads():
    """Test with many file uploads in form."""
    n = 100
    attribute_map = {f'f{i}': f'f{i}' for i in range(n)}
    location_map = {f'f{i}': 'form' for i in range(n)}
    openapi_types = {f'f{i}': _file_type_tuple() for i in range(n)}
    kwargs = {f'f{i}': f'filedata{i}' for i in range(n)}
    codeflash_output = EndpointUtils.gather_params(
        attribute_map=attribute_map,
        location_map=location_map,
        openapi_types=openapi_types,
        collection_format_map={},
        kwargs=kwargs
    ); result = codeflash_output # 32.1μs -> 21.2μs (51.3% faster)
    for i in range(n):
        pass

def test_gather_params_large_collection_format():
    """Test with many collection_format entries."""
    n = 300
    attribute_map = {f'q{i}': f'q{i}' for i in range(n)}
    location_map = {f'q{i}': 'query' for i in range(n)}
    openapi_types = {f'q{i}': (list,) for i in range(n)}
    collection_format_map = {f'q{i}': 'csv' for i in range(n)}
    kwargs = {f'q{i}': [i, i+1] for i in range(n)}
    codeflash_output = EndpointUtils.gather_params(
        attribute_map=attribute_map,
        location_map=location_map,
        openapi_types=openapi_types,
        collection_format_map=collection_format_map,
        kwargs=kwargs
    ); result = codeflash_output # 79.6μs -> 67.1μs (18.6% faster)
    for i in range(n):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Any, Dict, Tuple

# imports
import pytest
from pinecone.openapi_support.endpoint_utils import EndpointUtils


# --- Minimal stubs for dependencies (since we can't import Pinecone) ---
class PineconeApiTypeError(Exception):
    pass

def file_type():
    pass  # Just a placeholder type


AttributeMapDictType = Dict[str, str]
LocationMapDictType = Dict[str, str]
OpenapiTypesDictType = Dict[str, Tuple]
CombinedParamsMapDict = Dict[str, Any]

def gather_params(
    attribute_map: AttributeMapDictType,
    location_map: LocationMapDictType,
    openapi_types: OpenapiTypesDictType,
    collection_format_map: Dict[str, str],
    kwargs: Dict[str, Any],
) -> CombinedParamsMapDict:
    params: CombinedParamsMapDict = {
        "body": None,
        "collection_format": {},
        "file": {},
        "form": [],
        "header": {},
        "path": {},
        "query": [],
    }

    for param_name, param_value in kwargs.items():
        param_location = location_map.get(param_name)
        if param_location is None:
            continue
        if param_location:
            if param_location == "body":
                params["body"] = param_value
                continue
            base_name = attribute_map[param_name]
            if param_location == "form" and openapi_types[param_name] == (file_type,):
                params["file"][param_name] = [param_value]
            elif param_location == "form" and openapi_types[param_name] == ([file_type],):
                params["file"][param_name] = param_value
            elif param_location == "form":
                param_value_full = (base_name, param_value)
                params["form"].append(param_value_full)
            elif param_location == "query":
                param_value_full = (base_name, param_value)
                params["query"].append(param_value_full)
            elif param_location == "header":
                params["header"][base_name] = param_value
            elif param_location == "path":
                params["path"][base_name] = param_value
            else:
                raise PineconeApiTypeError(f"Got an unexpected location '{param_location}' for parameter `{param_name}`")

            collection_format = collection_format_map.get(param_name)
            if collection_format:
                params["collection_format"][base_name] = collection_format

    return params

# --- Unit tests ---

# Basic Test Cases




















#------------------------------------------------
from pinecone.openapi_support.endpoint_utils import EndpointUtils
import pytest

def test_EndpointUtils_gather_params():
    EndpointUtils.gather_params({'': 'form'}, {'': 'form'}, {'': ()}, {}, {'': 'form'})

def test_EndpointUtils_gather_params_2():
    EndpointUtils.gather_params({'': ''}, {'': 'path'}, {}, {}, {'': 'path'})

def test_EndpointUtils_gather_params_3():
    EndpointUtils.gather_params({'': ''}, {'': 'query'}, {}, {}, {'': ''})

def test_EndpointUtils_gather_params_4():
    EndpointUtils.gather_params({'': ''}, {'': 'body'}, {}, {}, {'': 0})

def test_EndpointUtils_gather_params_5():
    with pytest.raises(PineconeApiTypeError, match="Got\\ an\\ unexpected\\ location\\ '%s'\\ for\\ parameter\\ `%s`"):
        EndpointUtils.gather_params({'': 2}, {'': 2}, {}, {'': ''}, {'': 2})

def test_EndpointUtils_gather_params_6():
    EndpointUtils.gather_params({}, {}, {}, {}, {'': 0})

To edit these changes git checkout codeflash/optimize-EndpointUtils.gather_params-mh9g6ou8 and push.

Codeflash

The optimization achieves a **26% speedup** by reducing dictionary lookup overhead and method call costs in the critical loop path. Here are the key optimizations:

**1. Method Binding Optimization**
- Pre-binds frequently called methods like `location_map.get()` and `attribute_map.__getitem__()` to local variables, eliminating repeated attribute lookups during the loop iteration.

**2. String Constant Caching**
- Stores location strings (`"body"`, `"form"`, `"query"`, etc.) as local variables, avoiding repeated string literal creation and comparison overhead.

**3. Dictionary Reference Pre-binding**
- Pre-extracts references to nested dictionary values (e.g., `params["form"]`, `params["query"]`) before the loop, reducing repeated key lookups in the hot path.

**4. Identity vs Equality Optimization**
- Uses `is` instead of `==` for tuple comparisons with `file_type_tuple`, which is faster for singleton object identity checks.

**5. Restructured Control Flow**
- Consolidates the form parameter handling logic to reduce repeated `openapi_types` lookups and streamlines the conditional branches.

**Performance Impact by Test Case:**
- **Large-scale scenarios benefit most**: 27-51% faster on tests with 100+ parameters, where the loop overhead dominates
- **Small cases show modest regression**: 3-37% slower on simple cases due to initialization overhead of local bindings
- **Sweet spot**: Medium to large parameter sets where the loop iterations amortize the setup cost

This optimization is particularly effective for API endpoints that process many parameters simultaneously, which is common in OpenAPI/REST frameworks.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 27, 2025 18:04
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant