Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 27, 2025

📄 37% (0.37x) speedup for IndexHostStore.get_host in pinecone/db_control/index_host_store.py

⏱️ Runtime : 2.21 milliseconds 1.61 milliseconds (best of 151 runs)

📝 Explanation and details

The optimized code achieves a 36% speedup through several key micro-optimizations that reduce method calls and attribute lookups:

Primary Optimizations:

  1. Eliminated redundant method calls: Replaced self._key(config, index_name) calls with direct f-string formatting f"{config.api_key}:{index_name}" in set_host and get_host. This removes the overhead of method invocation while keeping the same string construction logic.

  2. Local caching of self._indexHosts: In get_host, assigned store = self._indexHosts to avoid repeated attribute lookups. Dictionary access via the local variable store[key] is faster than self._indexHosts[key].

  3. Streamlined host setting logic: In the get_host exception handler, extracted host = description.host and only call normalize_host() and store the result if the host exists, reducing unnecessary function calls.

  4. Inlined key generation: Removed the dependency on self._key() method calls by directly using f-strings, eliminating function call overhead.

Performance Impact by Test Type:

  • Cache hits (already stored hosts): ~25-30% faster due to local variable optimization
  • Cache misses (new hosts): ~45-60% faster due to eliminated method calls during the fetch-and-store process
  • Large scale operations: ~40-50% faster, with the benefits compounding over many operations

The optimizations are most effective for workloads with frequent cache misses or high-volume operations, where the eliminated method call overhead provides the most benefit.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2156 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Dict

# imports
import pytest
from pinecone.db_control.index_host_store import IndexHostStore

# --- Function and dependencies to test ---

# Dummy PineconeException for testing
class PineconeException(Exception):
    pass

# Dummy Config class
class Config:
    def __init__(self, api_key):
        self.api_key = api_key

# Dummy description object returned by describe_index
class IndexDescription:
    def __init__(self, host):
        self.host = host

# Dummy IndexOperationsApi class
class IndexOperationsApi:
    def __init__(self, index_to_host):
        # index_to_host: dict mapping index_name -> host
        self.index_to_host = index_to_host

    def describe_index(self, index_name):
        # Simulates Pinecone's describe_index
        if index_name not in self.index_to_host:
            raise PineconeException(f"Index '{index_name}' not found")
        return IndexDescription(self.index_to_host[index_name])

# --- Unit Tests ---

@pytest.fixture
def fresh_store():
    # Ensure a fresh store for each test
    IndexHostStore._instances.clear()
    return IndexHostStore()

# ------------------- Basic Test Cases -------------------

def test_get_host_returns_cached_host(fresh_store):
    # Test that get_host returns a cached host if present
    config = Config(api_key="key123")
    index_name = "myindex"
    host = "https://cached-host.com"
    key = config.api_key + ":" + index_name
    fresh_store._indexHosts[key] = host
    api = IndexOperationsApi(index_to_host={})
    codeflash_output = fresh_store.get_host(api, config, index_name); result = codeflash_output # 1.01μs -> 806ns (25.8% faster)

def test_get_host_fetches_and_caches_host(fresh_store):
    # Test that get_host fetches host if not cached, then caches it
    config = Config(api_key="key456")
    index_name = "newindex"
    host = "host-from-api.com"
    api = IndexOperationsApi(index_to_host={index_name: host})
    codeflash_output = fresh_store.get_host(api, config, index_name); result = codeflash_output # 4.28μs -> 2.80μs (53.0% faster)
    key = config.api_key + ":" + index_name

def test_get_host_normalizes_host_https(fresh_store):
    # Host already starts with https://, should not change
    config = Config(api_key="key789")
    index_name = "secureindex"
    host = "https://secure-host.com"
    api = IndexOperationsApi(index_to_host={index_name: host})
    codeflash_output = fresh_store.get_host(api, config, index_name); result = codeflash_output # 3.79μs -> 2.51μs (50.7% faster)

def test_get_host_normalizes_host_http(fresh_store):
    # Host starts with http://, should not add https://
    config = Config(api_key="key101")
    index_name = "plainindex"
    host = "http://plain-host.com"
    api = IndexOperationsApi(index_to_host={index_name: host})
    codeflash_output = fresh_store.get_host(api, config, index_name); result = codeflash_output # 3.80μs -> 2.53μs (49.9% faster)

def test_get_host_normalizes_host_no_protocol(fresh_store):
    # Host without protocol should get https:// prefix
    config = Config(api_key="key102")
    index_name = "noprotocolindex"
    host = "no-protocol-host.com"
    api = IndexOperationsApi(index_to_host={index_name: host})
    codeflash_output = fresh_store.get_host(api, config, index_name); result = codeflash_output # 3.84μs -> 2.65μs (44.9% faster)

# ------------------- Edge Test Cases -------------------

def test_get_host_raises_if_index_not_found(fresh_store):
    # If describe_index raises, get_host should propagate PineconeException
    config = Config(api_key="key404")
    index_name = "missingindex"
    api = IndexOperationsApi(index_to_host={})  # No such index
    with pytest.raises(PineconeException) as excinfo:
        fresh_store.get_host(api, config, index_name) # 2.22μs -> 2.05μs (8.10% faster)


def test_set_host_does_not_store_empty_host(fresh_store):
    # set_host with empty string should not store anything
    config = Config(api_key="keyempty")
    index_name = "emptyindex"
    fresh_store.set_host(config, index_name, "")
    key = config.api_key + ":" + index_name


def test_get_host_multiple_keys(fresh_store):
    # Different api_keys and index_names should not interfere
    config1 = Config(api_key="keyA")
    config2 = Config(api_key="keyB")
    index1 = "indexA"
    index2 = "indexB"
    host1 = "hostA.com"
    host2 = "hostB.com"
    api = IndexOperationsApi(index_to_host={index1: host1, index2: host2})
    codeflash_output = fresh_store.get_host(api, config1, index1); result1 = codeflash_output # 4.73μs -> 3.25μs (45.8% faster)
    codeflash_output = fresh_store.get_host(api, config2, index2); result2 = codeflash_output # 1.83μs -> 1.19μs (53.6% faster)

def test_get_host_overwrites_existing_host(fresh_store):
    # set_host should overwrite previous host for same key
    config = Config(api_key="keyX")
    index_name = "indexX"
    host1 = "host1.com"
    host2 = "host2.com"
    fresh_store.set_host(config, index_name, host1)
    fresh_store.set_host(config, index_name, host2)

# ------------------- Large Scale Test Cases -------------------

def test_get_host_many_indexes_and_keys(fresh_store):
    # Test with many indexes and api_keys
    num_indexes = 100
    num_keys = 10
    hosts = {}
    api_keys = [f"key{i}" for i in range(num_keys)]
    index_names = [f"index{i}" for i in range(num_indexes)]
    # Build mapping for api
    index_to_host = {name: f"{name}.host.com" for name in index_names}
    api = IndexOperationsApi(index_to_host=index_to_host)
    # Query all combinations
    for api_key in api_keys:
        config = Config(api_key=api_key)
        for index_name in index_names:
            codeflash_output = fresh_store.get_host(api, config, index_name); result = codeflash_output
            # Check cache
            key = config.api_key + ":" + index_name

def test_get_host_performance_under_load(fresh_store):
    # Test that get_host is efficient for repeated queries
    config = Config(api_key="keyperf")
    index_names = [f"perfindex{i}" for i in range(200)]
    index_to_host = {name: f"{name}.host.com" for name in index_names}
    api = IndexOperationsApi(index_to_host=index_to_host)
    # First pass: all should be cache misses and fill cache
    for name in index_names:
        codeflash_output = fresh_store.get_host(api, config, name); result = codeflash_output # 241μs -> 173μs (39.4% faster)
    # Second pass: all should be cache hits (no describe_index calls)
    for name in index_names:
        codeflash_output = fresh_store.get_host(api, config, name); result = codeflash_output # 61.2μs -> 48.2μs (26.9% faster)

def test_get_host_handles_long_index_names_and_keys(fresh_store):
    # Test with long api_key and index_name
    long_api_key = "a" * 500
    long_index_name = "index_" + ("b" * 500)
    host = "longhost.com"
    api = IndexOperationsApi(index_to_host={long_index_name: host})
    config = Config(api_key=long_api_key)
    codeflash_output = fresh_store.get_host(api, config, long_index_name); result = codeflash_output # 6.11μs -> 3.39μs (80.1% faster)
    key = long_api_key + ":" + long_index_name
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Dict, Optional

# imports
import pytest
from pinecone.db_control.index_host_store import IndexHostStore


# pinecone/config.py
class Config:
    def __init__(self, api_key: str):
        self.api_key = api_key

# pinecone/openapi_support/exceptions.py
class PineconeException(Exception):
    pass

# pinecone/core/openapi/db_control/api/manage_indexes_api.py
class IndexOperationsApi:
    def __init__(self, describe_index_map=None):
        # describe_index_map: dict mapping index_name -> object with .host attribute
        self._describe_index_map = describe_index_map or {}

    def describe_index(self, index_name: str):
        if index_name not in self._describe_index_map:
            raise PineconeException(f"Index '{index_name}' does not exist.")
        return self._describe_index_map[index_name]

# --- Helper classes for testing ---

class IndexDescription:
    def __init__(self, host):
        self.host = host

# --- Unit Tests ---

@pytest.fixture
def store():
    # Ensure a fresh store for each test by clearing the singleton instance
    IndexHostStore._instances.pop(IndexHostStore, None)
    return IndexHostStore()

@pytest.fixture
def config():
    return Config(api_key="test_key")

@pytest.fixture
def api():
    # Default empty api
    return IndexOperationsApi()

# 1. Basic Test Cases

def test_get_host_returns_existing_host(store, config):
    # Test: host is already set
    index_name = "myindex"
    host = "https://existing-host.com"
    store.set_host(config, index_name, host)
    api = IndexOperationsApi()
    codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 852ns -> 828ns (2.90% faster)

def test_get_host_sets_and_returns_host_from_api(store, config):
    # Test: host is not set, gets from api.describe_index
    index_name = "myindex"
    host = "host-from-api.com"
    api = IndexOperationsApi({index_name: IndexDescription(host)})
    codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 3.57μs -> 2.28μs (56.2% faster)

def test_get_host_normalizes_https(store, config):
    # Test: host from api is already normalized (https)
    index_name = "myindex"
    host = "https://secure-host.com"
    api = IndexOperationsApi({index_name: IndexDescription(host)})
    codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 3.15μs -> 2.07μs (52.3% faster)

def test_get_host_normalizes_http(store, config):
    # Test: host from api is already normalized (http)
    index_name = "myindex"
    host = "http://plain-host.com"
    api = IndexOperationsApi({index_name: IndexDescription(host)})
    codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 3.36μs -> 2.08μs (61.4% faster)

def test_set_host_with_none_does_not_set(store, config):
    # Test: set_host with None does not set anything
    index_name = "myindex"
    store.set_host(config, index_name, None)
    key = store._key(config, index_name)

def test_set_host_with_empty_string_does_not_set(store, config):
    # Test: set_host with empty string does not set anything
    index_name = "myindex"
    store.set_host(config, index_name, "")
    key = store._key(config, index_name)

# 2. Edge Test Cases

def test_get_host_raises_if_describe_index_missing(store, config):
    # Test: describe_index raises exception if index not found
    index_name = "missingindex"
    api = IndexOperationsApi({})
    with pytest.raises(PineconeException) as excinfo:
        store.get_host(api, config, index_name) # 2.35μs -> 2.15μs (9.70% faster)



def test_multiple_configs_and_indexes(store):
    # Test: different configs and index names
    config1 = Config(api_key="key1")
    config2 = Config(api_key="key2")
    index1 = "indexA"
    index2 = "indexB"
    host1 = "host1.com"
    host2 = "host2.com"
    api = IndexOperationsApi({index1: IndexDescription(host1), index2: IndexDescription(host2)})
    codeflash_output = store.get_host(api, config1, index1); result1 = codeflash_output # 4.08μs -> 2.60μs (56.7% faster)
    codeflash_output = store.get_host(api, config2, index2); result2 = codeflash_output # 1.79μs -> 1.05μs (69.6% faster)

def test_get_host_with_non_ascii_index_name(store, config):
    # Test: index name with non-ASCII characters
    index_name = "索引名"
    host = "nonascii-host.com"
    api = IndexOperationsApi({index_name: IndexDescription(host)})
    codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 3.68μs -> 2.49μs (47.8% faster)

def test_get_host_with_long_index_name(store, config):
    # Test: very long index name
    index_name = "x" * 255
    host = "long-host.com"
    api = IndexOperationsApi({index_name: IndexDescription(host)})
    codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 3.65μs -> 2.27μs (60.9% faster)

def test_get_host_with_special_characters_in_host(store, config):
    # Test: host with special characters
    index_name = "specialhost"
    host = "host-with-dash_underscore.com:8080"
    api = IndexOperationsApi({index_name: IndexDescription(host)})
    codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 3.30μs -> 2.21μs (48.9% faster)

def test_get_host_with_host_already_normalized(store, config):
    # Test: host already starts with https://
    index_name = "normalized"
    host = "https://already-normalized.com"
    api = IndexOperationsApi({index_name: IndexDescription(host)})
    codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 3.18μs -> 2.05μs (54.8% faster)

def test_get_host_with_host_already_http(store, config):
    # Test: host starts with http://
    index_name = "httpnormalized"
    host = "http://already-normalized.com"
    api = IndexOperationsApi({index_name: IndexDescription(host)})
    codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 3.27μs -> 2.06μs (58.8% faster)

# 3. Large Scale Test Cases

def test_get_host_many_indexes(store):
    # Test: many indexes, ensure scalability and correctness
    num_indexes = 500  # Large but not excessive
    api_key = "bulk_key"
    config = Config(api_key=api_key)
    api_map = {}
    for i in range(num_indexes):
        index_name = f"index_{i}"
        host = f"host_{i}.com"
        api_map[index_name] = IndexDescription(host)
    api = IndexOperationsApi(api_map)
    # Query all hosts and check correctness
    for i in range(num_indexes):
        index_name = f"index_{i}"
        expected_host = f"https://host_{i}.com"
        codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 535μs -> 369μs (44.7% faster)
        # Should be cached now
        key = config.api_key + ":" + index_name

def test_get_host_many_configs(store):
    # Test: many configs (API keys), ensure separation
    num_configs = 100
    index_name = "sharedindex"
    api_map = {index_name: IndexDescription("sharedhost.com")}
    api = IndexOperationsApi(api_map)
    for i in range(num_configs):
        config = Config(api_key=f"key_{i}")
        codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 109μs -> 73.9μs (47.7% faster)
        key = config.api_key + ":" + index_name

def test_get_host_performance_under_load(store):
    # Test: repeated get_host calls for same index, should be fast after first
    index_name = "perfindex"
    host = "perfhost.com"
    api = IndexOperationsApi({index_name: IndexDescription(host)})
    config = Config(api_key="perfkey")
    # First call populates cache
    codeflash_output = store.get_host(api, config, index_name); result1 = codeflash_output # 3.06μs -> 2.06μs (48.9% faster)
    # Subsequent calls should use cache, not call describe_index again
    for _ in range(100):
        codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 30.5μs -> 23.6μs (29.2% faster)

def test_get_host_with_varied_host_formats(store, config):
    # Test: hosts with various formats
    index_names = ["plain", "http", "https", "port", "subdomain"]
    hosts = [
        "plainhost.com",
        "http://hostwithhttp.com",
        "https://hostwithhttps.com",
        "hostwithport.com:1234",
        "sub.domain.com"
    ]
    api_map = {name: IndexDescription(host) for name, host in zip(index_names, hosts)}
    api = IndexOperationsApi(api_map)
    expected = [
        "https://plainhost.com",
        "http://hostwithhttp.com",
        "https://hostwithhttps.com",
        "https://hostwithport.com:1234",
        "https://sub.domain.com"
    ]
    for index_name, exp in zip(index_names, expected):
        codeflash_output = store.get_host(api, config, index_name); result = codeflash_output # 8.76μs -> 5.68μs (54.4% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from pinecone.config.config import Config
from pinecone.core.openapi.db_control.api.manage_indexes_api import ManageIndexesApi
from pinecone.db_control.index_host_store import IndexHostStore
import pytest

def test_IndexHostStore_get_host():
    with pytest.raises(AttributeError, match="'SymbolicInt'\\ object\\ has\\ no\\ attribute\\ 'configuration'"):
        IndexHostStore.get_host(IndexHostStore(), ManageIndexesApi(api_client=0), Config(api_key='', host='', proxy_url='', proxy_headers={}, ssl_ca_certs='', ssl_verify=None, additional_headers={}, source_tag=''), '')

To edit these changes git checkout codeflash/optimize-IndexHostStore.get_host-mh9iw233 and push.

Codeflash

The optimized code achieves a **36% speedup** through several key micro-optimizations that reduce method calls and attribute lookups:

**Primary Optimizations:**

1. **Eliminated redundant method calls**: Replaced `self._key(config, index_name)` calls with direct f-string formatting `f"{config.api_key}:{index_name}"` in `set_host` and `get_host`. This removes the overhead of method invocation while keeping the same string construction logic.

2. **Local caching of `self._indexHosts`**: In `get_host`, assigned `store = self._indexHosts` to avoid repeated attribute lookups. Dictionary access via the local variable `store[key]` is faster than `self._indexHosts[key]`.

3. **Streamlined host setting logic**: In the `get_host` exception handler, extracted `host = description.host` and only call `normalize_host()` and store the result if the host exists, reducing unnecessary function calls.

4. **Inlined key generation**: Removed the dependency on `self._key()` method calls by directly using f-strings, eliminating function call overhead.

**Performance Impact by Test Type:**
- **Cache hits** (already stored hosts): ~25-30% faster due to local variable optimization
- **Cache misses** (new hosts): ~45-60% faster due to eliminated method calls during the fetch-and-store process  
- **Large scale operations**: ~40-50% faster, with the benefits compounding over many operations

The optimizations are most effective for workloads with frequent cache misses or high-volume operations, where the eliminated method call overhead provides the most benefit.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 27, 2025 19:20
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant