[Feature][Response API] Add streaming support for non-harmony #23741

kebe7jun · 2025-08-27T11:32:21Z

Purpose

Add streaming support for non-harmony

Related issue #23225

Test Plan

Unit tests and self tests(see result).

Test Result

GPT-OSS Stream output

ResponseCreatedEvent(response=Response(id='resp_3bc9f13acb90485daa3d1694ac9ea14c', created_at=1756274867.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=None, truncation='disabled', usage=None, user=None), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_3bc9f13acb90485daa3d1694ac9ea14c', created_at=1756274867.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=None, truncation='disabled', usage=None, user=None), sequence_number=1, type='response.in_progress')
ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=0, sequence_number=2, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=0, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=3, type='response.content_part.added')
ResponseReasoningTextDeltaEvent(content_index=0, delta='User', item_id='', output_index=0, sequence_number=4, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta=' wants', item_id='', output_index=0, sequence_number=5, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta=' us', item_id='', output_index=0, sequence_number=6, type='response.reasoning_text.delta')
...
ResponseReasoningTextDeltaEvent(content_index=0, delta=' but', item_id='', output_index=0, sequence_number=110, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta=' okay', item_id='', output_index=0, sequence_number=111, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=0, sequence_number=112, type='response.reasoning_text.delta')
ResponseReasoningTextDoneEvent(content_index=0, item_id='', output_index=1, sequence_number=113, text='User wants us to say \'double bubble bath\' ten times fast. We need to comply? It\'s a nonsensical request but presumably no policy violation. It\'s a benign language request. We can comply by repeating phrase 10 times quickly. Should we maybe output a line like "double bubble bath" repeated 10 times quickly. That\'s fine.\n\nNo policy conflicts. The phrase is not disallowed. So we comply.\n\nWe should produce "double bubble bath double bubble bath ... " repeated 10 times. be mindful it\'s too much but okay.', type='response.reasoning_text.done')
ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text='User wants us to say \'double bubble bath\' ten times fast. We need to comply? It\'s a nonsensical request but presumably no policy violation. It\'s a benign language request. We can comply by repeating phrase 10 times quickly. Should we maybe output a line like "double bubble bath" repeated 10 times quickly. That\'s fine.\n\nNo policy conflicts. The phrase is not disallowed. So we comply.\n\nWe should produce "double bubble bath double bubble bath ... " repeated 10 times. be mindful it\'s too much but okay.', type='reasoning_text')], encrypted_content=None, status='completed'), output_index=1, sequence_number=114, type='response.output_item.done')
ResponseOutputItemAddedEvent(item=ResponseOutputMessage(id='', content=[], role='assistant', status='in_progress', type='message'), output_index=1, sequence_number=115, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=1, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=116, type='response.content_part.added')
ResponseTextDeltaEvent(content_index=0, delta='double', item_id='', logprobs=[], output_index=1, sequence_number=117, type='response.output_text.delta')
ResponseTextDeltaEvent(content_index=0, delta=' bubble', item_id='', logprobs=[], output_index=1, sequence_number=118, type='response.output_text.delta')
...
ResponseTextDeltaEvent(content_index=0, delta=' bubble', item_id='', logprobs=[], output_index=1, sequence_number=145, type='response.output_text.delta')
ResponseTextDeltaEvent(content_index=0, delta=' bath', item_id='', logprobs=[], output_index=1, sequence_number=146, type='response.output_text.delta')
ResponseTextDoneEvent(content_index=0, item_id='', logprobs=[], output_index=2, sequence_number=147, text='double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath', type='response.output_text.done')
ResponseContentPartDoneEvent(content_index=0, item_id='', output_index=2, part=ResponseOutputText(annotations=[], text='double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath', type='output_text', logprobs=None), sequence_number=148, type='response.content_part.done')
ResponseOutputItemDoneEvent(item=ResponseOutputMessage(id='', content=[ResponseOutputText(annotations=[], text='double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath', type='output_text', logprobs=None)], role='assistant', status='completed', type='message'), output_index=2, sequence_number=149, type='response.output_item.done')
ResponseCompletedEvent(response=Response(id='resp_3bc9f13acb90485daa3d1694ac9ea14c', created_at=1756274867.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=None, truncation='disabled', usage=ResponseUsage(input_tokens=81, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=149, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=230), user=None), sequence_number=150, type='response.completed')

Qwen3 30B A3B Stream output

ResponseCreatedEvent(response=Response(id='resp_42d533df110a4e28b84f051f7839ca58', created_at=1756295396.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=0.6, tool_choice='auto', tools=[], top_p=0.95, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=2, truncation='disabled', usage=None, user=None), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_42d533df110a4e28b84f051f7839ca58', created_at=1756295396.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=0.6, tool_choice='auto', tools=[], top_p=0.95, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=2, truncation='disabled', usage=None, user=None), sequence_number=1, type='response.in_progress')
ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=0, sequence_number=2, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=0, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=3, type='response.content_part.added')
ResponseReasoningTextDeltaEvent(content_index=0, delta='\n', item_id='', output_index=0, sequence_number=4, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta='Okay', item_id='', output_index=0, sequence_number=5, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta='.\n', item_id='', output_index=0, sequence_number=301, type='response.reasoning_text.delta')
ResponseReasoningTextDoneEvent(content_index=0, item_id='', output_index=0, sequence_number=302, text='\nOkay, the user wants me to say "double bubble bath" ten times fast. Let me start by repeating it as instructed. First, I\'ll check if I can do it quickly without making mistakes. Let me count: 1, 2, 3... Hmm, maybe I should practice a few times to get the rhythm right. Wait, the user just wants me to say it ten times, not necessarily with any specific speed, but "fast" might mean as quickly as possible. I need to make sure each repetition is clear but done in quick succession. Let me try again. Double bubble bath, double bubble bath... Okay, that\'s two. Three, four... I need to keep track. Maybe I can say them in a row without pausing. Let me do that. Double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath. Wait, that\'s ten. Did I get all ten? Let me count again. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Yes, that\'s ten. I think that works. I should present it as a single string of ten repetitions. Let me check for any errors. Each time I said "double bubble bath" correctly. Alright, that should do it.\n', type='response.reasoning_text.done')
ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text='\nOkay, the user wants me to say "double bubble bath" ten times fast. Let me start by repeating it as instructed. First, I\'ll check if I can do it quickly without making mistakes. Let me count: 1, 2, 3... Hmm, maybe I should practice a few times to get the rhythm right. Wait, the user just wants me to say it ten times, not necessarily with any specific speed, but "fast" might mean as quickly as possible. I need to make sure each repetition is clear but done in quick succession. Let me try again. Double bubble bath, double bubble bath... Okay, that\'s two. Three, four... I need to keep track. Maybe I can say them in a row without pausing. Let me do that. Double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath. Wait, that\'s ten. Did I get all ten? Let me count again. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Yes, that\'s ten. I think that works. I should present it as a single string of ten repetitions. Let me check for any errors. Each time I said "double bubble bath" correctly. Alright, that should do it.\n', type='reasoning_text')], encrypted_content=None, status='completed'), output_index=0, sequence_number=303, type='response.output_item.done')
ResponseOutputItemAddedEvent(item=ResponseOutputMessage(id='', content=[], role='assistant', status='in_progress', type='message'), output_index=0, sequence_number=304, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=1, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=305, type='response.content_part.added')
ResponseTextDeltaEvent(content_index=0, delta='\n\n', item_id='', logprobs=[Logprob(token='\n\n', logprob=0.0, top_logprobs=[LogprobTopLogprob(token='\n\n', logprob=0.0), LogprobTopLogprob(token='\n\n\n', logprob=-21.375)])], output_index=1, sequence_number=306, type='response.output_text.delta')
ResponseTextDeltaEvent(content_index=0, delta='Double', item_id='', logprobs=[Logprob(token='Double', logprob=-0.00012635385792236775, top_logprobs=[LogprobTopLogprob(token='Double', logprob=-0.00012635385792236775), LogprobTopLogprob(token='double', logprob=-9.000125885009766)])], output_index=1, sequence_number=307, type='response.output_text.delta')
ResponseTextDeltaEvent(content_index=0, delta=' bubble', item_id='', logprobs=[Logprob(token=' bubble', logprob=-0.0003361137059982866, top_logprobs=[LogprobTopLogprob(token=' bubble', logprob=-0.0003361137059982866), LogprobTopLogprob(token='bubble', logprob=-8.000335693359375)])], output_index=1, sequence_number=308, type='response.output_text.delta')
ResponseTextDeltaEvent(content_index=0, delta=' bath', item_id='', logprobs=[Logprob(token=' bath', logprob=-1.1920928244535389e-07, top_logprobs=[LogprobTopLogprob(token=' bath', logprob=-1.1920928244535389e-07), LogprobTopLogprob(token=' bat', logprob=-16.75)])], output_index=1, sequence_number=309, type='response.output_text.delta')

...
ResponseTextDoneEvent(content_index=0, item_id='', logprobs=[], output_index=1, sequence_number=339, text='\n\nDouble bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath.', type='response.output_text.done')
ResponseContentPartDoneEvent(content_index=0, item_id='', output_index=1, part=ResponseOutputText(annotations=[], text='\n\nDouble bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath.', type='output_text', logprobs=None), sequence_number=340, type='response.content_part.done')
ResponseOutputItemDoneEvent(item=ResponseOutputMessage(id='', content=[ResponseOutputText(annotations=[], text='\n\nDouble bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath.', type='output_text', logprobs=None)], role='assistant', status='completed', type='message', summary=[]), output_index=1, sequence_number=341, type='response.output_item.done')
ResponseCompletedEvent(response=Response(id='resp_42d533df110a4e28b84f051f7839ca58', created_at=1756295396.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=0.6, tool_choice='auto', tools=[], top_p=0.95, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=2, truncation='disabled', usage=ResponseUsage(input_tokens=18, input_tokens_details=InputTokensDetails(cached_tokens=16), output_tokens=333, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=351), user=None), sequence_number=342, type='response.completed')

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Kebe <[email protected]>

kebe7jun · 2025-08-28T03:49:32Z

@heheda12345 PTAL

kebe7jun force-pushed the feature/responses-api-streaming branch from 8dc2da4 to b65638e Compare August 27, 2025 11:32

mergify bot added the frontend label Aug 27, 2025

kebe7jun force-pushed the feature/responses-api-streaming branch from b65638e to 3bb6902 Compare August 27, 2025 11:46

mergify bot added the v1 label Aug 27, 2025

kebe7jun marked this pull request as ready for review August 27, 2025 11:55

kebe7jun requested a review from aarnphm as a code owner August 27, 2025 11:55

kebe7jun force-pushed the feature/responses-api-streaming branch from 3bb6902 to 6d9fe9c Compare August 28, 2025 00:56

[Feature][Response API] Add streaming support for non-harmony

af25d9a

Signed-off-by: Kebe <[email protected]>

kebe7jun force-pushed the feature/responses-api-streaming branch from 6d9fe9c to af25d9a Compare August 28, 2025 01:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature][Response API] Add streaming support for non-harmony #23741

[Feature][Response API] Add streaming support for non-harmony #23741

kebe7jun commented Aug 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

kebe7jun commented Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

[Feature][Response API] Add streaming support for non-harmony #23741

Are you sure you want to change the base?

[Feature][Response API] Add streaming support for non-harmony #23741

Conversation

kebe7jun commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

kebe7jun commented Aug 28, 2025

Uh oh!

Uh oh!

kebe7jun commented Aug 27, 2025 •

edited by github-actions bot

Loading