🐛 Bug Report: OpenAI not correctly instrumenting Responses API

### Which component is this bug for?

OpenAI Instrumentation

### 📜 Description

When using the Responses API and the OpenAI instrumentor, messages are missing from the trace.

Other collected attributes, like the token count, have correctly increased. I also sniffed the requests directly, and can confirm additional input messages sent in the request to OpenAI's API, but these were not in the trace.

Additional context: I'm using `store=False` and appending responses manually to the API calls due to ZDR. In other words, I'm using the Responses API in a stateless matter.

### 👟 Reproduction steps

Here's a self-contained reproducible example:

```python
import json
import pytest
from openai import AsyncOpenAI


def add_two_numbers(a: int, b: int) -> int:
    """Adds two numbers together."""
    return a + b


@pytest.mark.asyncio
async def test_openai_responses_tmp():
    client = AsyncOpenAI()

    tools = [
        {
            "type": "function",
            "name": "add_two_numbers",
            "description": "Adds two numbers together.",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "description": "The first number to add.",
                        "type": "integer",
                    },
                    "b": {
                        "description": "The second number to add.",
                        "type": "integer",
                    },
                },
                "additionalProperties": False,
                "required": ["a", "b"],
            },
            "strict": True,
        },
    ]

    context = [{"role": "user", "content": "Please add 1 + 2 and 3 + 4 for me."}]

    initial_params = {
        "model": "gpt-5",
        "instructions": "You are a helpful assistant. Use the add_two_numbers function when asked to add numbers. Make parallel tool calls where possible.",
        "input": context,
        "tools": tools,
        "include": ["reasoning.encrypted_content"],
        "parallel_tool_calls": True,
        "store": False,
        "text": {"verbosity": "low"},
        "reasoning": {"effort": "medium", "summary": "auto"},
    }

    response = await client.responses.create(**initial_params)
    context += response.output

    # Extract tool calls
    tool_calls = []
    for item in response.output:
        if hasattr(item, "type") and item.type == "function_call":
            tool_calls.append(item)
    assert tool_calls, "No tool calls found"

    # Step 2: Execute tool calls
    for tool_call in tool_calls:
        # Parse arguments and execute the function
        args = json.loads(tool_call.arguments)
        assert tool_call.name == "add_two_numbers"

        result = add_two_numbers(args["a"], args["b"])
        context.append(
            {
                "type": "function_call_output",
                "call_id": tool_call.call_id,
                "output": str(result),
            }
        )

    continue_params = {
        "model": "gpt-5",
        "instructions": "You are a helpful assistant. Use the add_two_numbers function when asked to add numbers.",
        "input": context,
        "tools": tools,
        "include": ["reasoning.encrypted_content"],
        "parallel_tool_calls": True,
        "store": False,
        "text": {"verbosity": "low"},
        "reasoning": {"effort": "medium", "summary": "auto"},
    }

    final_response = await client.responses.create(**continue_params)
    print(f"Final response: {final_response}")
```

When tracing the second request, we see the `input` field has value:

```json
  "input": [
    {
      "role": "user",
      "content": "Please add 1 + 2 and 3 + 4 for me.",
      "type": "message"
    },
    {
      "id": "rs_ABC",
      "summary": [],
      "type": "reasoning",
      "encrypted_content": "content"
    },
    {
      "arguments": "{\"a\":1,\"b\":2}",
      "call_id": "call_1",
      "name": "add_two_numbers",
      "type": "function_call",
      "id": "fc_1",
      "status": "completed"
    },
    {
      "arguments": "{\"a\":3,\"b\":4}",
      "call_id": "call_2",
      "name": "add_two_numbers",
      "type": "function_call",
      "id": "fc_2",
      "status": "completed"
    },
    {
      "type": "function_call_output",
      "call_id": "call_1",
      "output": "3"
    },
    {
      "type": "function_call_output",
      "call_id": "call_2",
      "output": "7"
    }
  ],
```

### 👍 Expected behavior

The logged span should contain the tool calls, tool responses, and reasoning summaries (if available) from previous steps.

When tracing the second LLM request, we should see gen_ai fields logged for the tool call and response provided in the input. In practice, these are not included.

### 👎 Actual Behavior with Screenshots

Relevant metadata for second LLM trace (note: no tool calls traced):
```
gen_ai.completion.0.content: |-
  1 + 2 = 3
  3 + 4 = 7
gen_ai.completion.0.role: assistant
gen_ai.prompt.0.content: You are a helpful assistant. Use the add_two_numbers function when asked to add numbers.
gen_ai.prompt.0.role: system
gen_ai.prompt.1.content: Please add 1 + 2 and 3 + 4 for me.
gen_ai.prompt.1.role: user
gen_ai.request.model: gpt-5
gen_ai.response.id: resp_xxx
gen_ai.response.model: gpt-5-2025-08-07
gen_ai.system: openai
gen_ai.usage.cache_read_input_tokens: 0
gen_ai.usage.input_tokens: 351
gen_ai.usage.output_tokens: 19
```

### 🤖 Python Version

_No response_

### 📃 Provide any additional context for the Bug.

- When I used OpenAI's Chat Completions API, the instrumentation worked correctly. So I believe this is down to the difference between trace collection for Responses vs Chat Completions.

### 👀 Have you spent some time to check if this bug has been raised before?

- [x] I checked and didn't find similar issue

### Are you willing to submit PR?

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 Bug Report: OpenAI not correctly instrumenting Responses API #3349

Which component is this bug for?

📜 Description

👟 Reproduction steps

👍 Expected behavior

👎 Actual Behavior with Screenshots

🤖 Python Version

📃 Provide any additional context for the Bug.

👀 Have you spent some time to check if this bug has been raised before?

Are you willing to submit PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛 Bug Report: OpenAI not correctly instrumenting Responses API #3349

Description

Which component is this bug for?

📜 Description

👟 Reproduction steps

👍 Expected behavior

👎 Actual Behavior with Screenshots

🤖 Python Version

📃 Provide any additional context for the Bug.

👀 Have you spent some time to check if this bug has been raised before?

Are you willing to submit PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions