Skip to content

πŸ› Bug Report: OpenAI not correctly instrumenting Responses APIΒ #3349

@itsaphel

Description

@itsaphel

Which component is this bug for?

OpenAI Instrumentation

πŸ“œ Description

When using the Responses API and the OpenAI instrumentor, messages are missing from the trace.

Other collected attributes, like the token count, have correctly increased. I also sniffed the requests directly, and can confirm additional input messages sent in the request to OpenAI's API, but these were not in the trace.

Additional context: I'm using store=False and appending responses manually to the API calls due to ZDR. In other words, I'm using the Responses API in a stateless matter.

πŸ‘Ÿ Reproduction steps

Here's a self-contained reproducible example:

import json
import pytest
from openai import AsyncOpenAI


def add_two_numbers(a: int, b: int) -> int:
    """Adds two numbers together."""
    return a + b


@pytest.mark.asyncio
async def test_openai_responses_tmp():
    client = AsyncOpenAI()

    tools = [
        {
            "type": "function",
            "name": "add_two_numbers",
            "description": "Adds two numbers together.",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "description": "The first number to add.",
                        "type": "integer",
                    },
                    "b": {
                        "description": "The second number to add.",
                        "type": "integer",
                    },
                },
                "additionalProperties": False,
                "required": ["a", "b"],
            },
            "strict": True,
        },
    ]

    context = [{"role": "user", "content": "Please add 1 + 2 and 3 + 4 for me."}]

    initial_params = {
        "model": "gpt-5",
        "instructions": "You are a helpful assistant. Use the add_two_numbers function when asked to add numbers. Make parallel tool calls where possible.",
        "input": context,
        "tools": tools,
        "include": ["reasoning.encrypted_content"],
        "parallel_tool_calls": True,
        "store": False,
        "text": {"verbosity": "low"},
        "reasoning": {"effort": "medium", "summary": "auto"},
    }

    response = await client.responses.create(**initial_params)
    context += response.output

    # Extract tool calls
    tool_calls = []
    for item in response.output:
        if hasattr(item, "type") and item.type == "function_call":
            tool_calls.append(item)
    assert tool_calls, "No tool calls found"

    # Step 2: Execute tool calls
    for tool_call in tool_calls:
        # Parse arguments and execute the function
        args = json.loads(tool_call.arguments)
        assert tool_call.name == "add_two_numbers"

        result = add_two_numbers(args["a"], args["b"])
        context.append(
            {
                "type": "function_call_output",
                "call_id": tool_call.call_id,
                "output": str(result),
            }
        )

    continue_params = {
        "model": "gpt-5",
        "instructions": "You are a helpful assistant. Use the add_two_numbers function when asked to add numbers.",
        "input": context,
        "tools": tools,
        "include": ["reasoning.encrypted_content"],
        "parallel_tool_calls": True,
        "store": False,
        "text": {"verbosity": "low"},
        "reasoning": {"effort": "medium", "summary": "auto"},
    }

    final_response = await client.responses.create(**continue_params)
    print(f"Final response: {final_response}")

When tracing the second request, we see the input field has value:

  "input": [
    {
      "role": "user",
      "content": "Please add 1 + 2 and 3 + 4 for me.",
      "type": "message"
    },
    {
      "id": "rs_ABC",
      "summary": [],
      "type": "reasoning",
      "encrypted_content": "content"
    },
    {
      "arguments": "{\"a\":1,\"b\":2}",
      "call_id": "call_1",
      "name": "add_two_numbers",
      "type": "function_call",
      "id": "fc_1",
      "status": "completed"
    },
    {
      "arguments": "{\"a\":3,\"b\":4}",
      "call_id": "call_2",
      "name": "add_two_numbers",
      "type": "function_call",
      "id": "fc_2",
      "status": "completed"
    },
    {
      "type": "function_call_output",
      "call_id": "call_1",
      "output": "3"
    },
    {
      "type": "function_call_output",
      "call_id": "call_2",
      "output": "7"
    }
  ],

πŸ‘ Expected behavior

The logged span should contain the tool calls, tool responses, and reasoning summaries (if available) from previous steps.

When tracing the second LLM request, we should see gen_ai fields logged for the tool call and response provided in the input. In practice, these are not included.

πŸ‘Ž Actual Behavior with Screenshots

Relevant metadata for second LLM trace (note: no tool calls traced):

gen_ai.completion.0.content: |-
  1 + 2 = 3
  3 + 4 = 7
gen_ai.completion.0.role: assistant
gen_ai.prompt.0.content: You are a helpful assistant. Use the add_two_numbers function when asked to add numbers.
gen_ai.prompt.0.role: system
gen_ai.prompt.1.content: Please add 1 + 2 and 3 + 4 for me.
gen_ai.prompt.1.role: user
gen_ai.request.model: gpt-5
gen_ai.response.id: resp_xxx
gen_ai.response.model: gpt-5-2025-08-07
gen_ai.system: openai
gen_ai.usage.cache_read_input_tokens: 0
gen_ai.usage.input_tokens: 351
gen_ai.usage.output_tokens: 19

πŸ€– Python Version

No response

πŸ“ƒ Provide any additional context for the Bug.

  • When I used OpenAI's Chat Completions API, the instrumentation worked correctly. So I believe this is down to the difference between trace collection for Responses vs Chat Completions.

πŸ‘€ Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

Are you willing to submit PR?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions