Skip to content

Conversation

kebe7jun
Copy link
Contributor

@kebe7jun kebe7jun commented Aug 27, 2025

Purpose

Add streaming support for non-harmony

Related issue #23225

Test Plan

Unit tests and self tests(see result).

Test Result

GPT-OSS Stream output
ResponseCreatedEvent(response=Response(id='resp_3bc9f13acb90485daa3d1694ac9ea14c', created_at=1756274867.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=None, truncation='disabled', usage=None, user=None), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_3bc9f13acb90485daa3d1694ac9ea14c', created_at=1756274867.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=None, truncation='disabled', usage=None, user=None), sequence_number=1, type='response.in_progress')
ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=0, sequence_number=2, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=0, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=3, type='response.content_part.added')
ResponseReasoningTextDeltaEvent(content_index=0, delta='User', item_id='', output_index=0, sequence_number=4, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta=' wants', item_id='', output_index=0, sequence_number=5, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta=' us', item_id='', output_index=0, sequence_number=6, type='response.reasoning_text.delta')
...
ResponseReasoningTextDeltaEvent(content_index=0, delta=' but', item_id='', output_index=0, sequence_number=110, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta=' okay', item_id='', output_index=0, sequence_number=111, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=0, sequence_number=112, type='response.reasoning_text.delta')
ResponseReasoningTextDoneEvent(content_index=0, item_id='', output_index=1, sequence_number=113, text='User wants us to say \'double bubble bath\' ten times fast. We need to comply? It\'s a nonsensical request but presumably no policy violation. It\'s a benign language request. We can comply by repeating phrase 10 times quickly. Should we maybe output a line like "double bubble bath" repeated 10 times quickly. That\'s fine.\n\nNo policy conflicts. The phrase is not disallowed. So we comply.\n\nWe should produce "double bubble bath double bubble bath ... " repeated 10 times. be mindful it\'s too much but okay.', type='response.reasoning_text.done')
ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text='User wants us to say \'double bubble bath\' ten times fast. We need to comply? It\'s a nonsensical request but presumably no policy violation. It\'s a benign language request. We can comply by repeating phrase 10 times quickly. Should we maybe output a line like "double bubble bath" repeated 10 times quickly. That\'s fine.\n\nNo policy conflicts. The phrase is not disallowed. So we comply.\n\nWe should produce "double bubble bath double bubble bath ... " repeated 10 times. be mindful it\'s too much but okay.', type='reasoning_text')], encrypted_content=None, status='completed'), output_index=1, sequence_number=114, type='response.output_item.done')
ResponseOutputItemAddedEvent(item=ResponseOutputMessage(id='', content=[], role='assistant', status='in_progress', type='message'), output_index=1, sequence_number=115, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=1, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=116, type='response.content_part.added')
ResponseTextDeltaEvent(content_index=0, delta='double', item_id='', logprobs=[], output_index=1, sequence_number=117, type='response.output_text.delta')
ResponseTextDeltaEvent(content_index=0, delta=' bubble', item_id='', logprobs=[], output_index=1, sequence_number=118, type='response.output_text.delta')
...
ResponseTextDeltaEvent(content_index=0, delta=' bubble', item_id='', logprobs=[], output_index=1, sequence_number=145, type='response.output_text.delta')
ResponseTextDeltaEvent(content_index=0, delta=' bath', item_id='', logprobs=[], output_index=1, sequence_number=146, type='response.output_text.delta')
ResponseTextDoneEvent(content_index=0, item_id='', logprobs=[], output_index=2, sequence_number=147, text='double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath', type='response.output_text.done')
ResponseContentPartDoneEvent(content_index=0, item_id='', output_index=2, part=ResponseOutputText(annotations=[], text='double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath', type='output_text', logprobs=None), sequence_number=148, type='response.content_part.done')
ResponseOutputItemDoneEvent(item=ResponseOutputMessage(id='', content=[ResponseOutputText(annotations=[], text='double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath', type='output_text', logprobs=None)], role='assistant', status='completed', type='message'), output_index=2, sequence_number=149, type='response.output_item.done')
ResponseCompletedEvent(response=Response(id='resp_3bc9f13acb90485daa3d1694ac9ea14c', created_at=1756274867.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=None, truncation='disabled', usage=ResponseUsage(input_tokens=81, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=149, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=230), user=None), sequence_number=150, type='response.completed')
Qwen3 30B A3B Stream output
ResponseCreatedEvent(response=Response(id='resp_42d533df110a4e28b84f051f7839ca58', created_at=1756295396.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=0.6, tool_choice='auto', tools=[], top_p=0.95, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=2, truncation='disabled', usage=None, user=None), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_42d533df110a4e28b84f051f7839ca58', created_at=1756295396.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=0.6, tool_choice='auto', tools=[], top_p=0.95, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=2, truncation='disabled', usage=None, user=None), sequence_number=1, type='response.in_progress')
ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=0, sequence_number=2, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=0, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=3, type='response.content_part.added')
ResponseReasoningTextDeltaEvent(content_index=0, delta='\n', item_id='', output_index=0, sequence_number=4, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta='Okay', item_id='', output_index=0, sequence_number=5, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta='.\n', item_id='', output_index=0, sequence_number=301, type='response.reasoning_text.delta')
ResponseReasoningTextDoneEvent(content_index=0, item_id='', output_index=0, sequence_number=302, text='\nOkay, the user wants me to say "double bubble bath" ten times fast. Let me start by repeating it as instructed. First, I\'ll check if I can do it quickly without making mistakes. Let me count: 1, 2, 3... Hmm, maybe I should practice a few times to get the rhythm right. Wait, the user just wants me to say it ten times, not necessarily with any specific speed, but "fast" might mean as quickly as possible. I need to make sure each repetition is clear but done in quick succession. Let me try again. Double bubble bath, double bubble bath... Okay, that\'s two. Three, four... I need to keep track. Maybe I can say them in a row without pausing. Let me do that. Double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath. Wait, that\'s ten. Did I get all ten? Let me count again. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Yes, that\'s ten. I think that works. I should present it as a single string of ten repetitions. Let me check for any errors. Each time I said "double bubble bath" correctly. Alright, that should do it.\n', type='response.reasoning_text.done')
ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text='\nOkay, the user wants me to say "double bubble bath" ten times fast. Let me start by repeating it as instructed. First, I\'ll check if I can do it quickly without making mistakes. Let me count: 1, 2, 3... Hmm, maybe I should practice a few times to get the rhythm right. Wait, the user just wants me to say it ten times, not necessarily with any specific speed, but "fast" might mean as quickly as possible. I need to make sure each repetition is clear but done in quick succession. Let me try again. Double bubble bath, double bubble bath... Okay, that\'s two. Three, four... I need to keep track. Maybe I can say them in a row without pausing. Let me do that. Double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath. Wait, that\'s ten. Did I get all ten? Let me count again. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Yes, that\'s ten. I think that works. I should present it as a single string of ten repetitions. Let me check for any errors. Each time I said "double bubble bath" correctly. Alright, that should do it.\n', type='reasoning_text')], encrypted_content=None, status='completed'), output_index=0, sequence_number=303, type='response.output_item.done')
ResponseOutputItemAddedEvent(item=ResponseOutputMessage(id='', content=[], role='assistant', status='in_progress', type='message'), output_index=0, sequence_number=304, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=1, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=305, type='response.content_part.added')
ResponseTextDeltaEvent(content_index=0, delta='\n\n', item_id='', logprobs=[Logprob(token='\n\n', logprob=0.0, top_logprobs=[LogprobTopLogprob(token='\n\n', logprob=0.0), LogprobTopLogprob(token='\n\n\n', logprob=-21.375)])], output_index=1, sequence_number=306, type='response.output_text.delta')
ResponseTextDeltaEvent(content_index=0, delta='Double', item_id='', logprobs=[Logprob(token='Double', logprob=-0.00012635385792236775, top_logprobs=[LogprobTopLogprob(token='Double', logprob=-0.00012635385792236775), LogprobTopLogprob(token='double', logprob=-9.000125885009766)])], output_index=1, sequence_number=307, type='response.output_text.delta')
ResponseTextDeltaEvent(content_index=0, delta=' bubble', item_id='', logprobs=[Logprob(token=' bubble', logprob=-0.0003361137059982866, top_logprobs=[LogprobTopLogprob(token=' bubble', logprob=-0.0003361137059982866), LogprobTopLogprob(token='bubble', logprob=-8.000335693359375)])], output_index=1, sequence_number=308, type='response.output_text.delta')
ResponseTextDeltaEvent(content_index=0, delta=' bath', item_id='', logprobs=[Logprob(token=' bath', logprob=-1.1920928244535389e-07, top_logprobs=[LogprobTopLogprob(token=' bath', logprob=-1.1920928244535389e-07), LogprobTopLogprob(token=' bat', logprob=-16.75)])], output_index=1, sequence_number=309, type='response.output_text.delta')

...
ResponseTextDoneEvent(content_index=0, item_id='', logprobs=[], output_index=1, sequence_number=339, text='\n\nDouble bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath.', type='response.output_text.done')
ResponseContentPartDoneEvent(content_index=0, item_id='', output_index=1, part=ResponseOutputText(annotations=[], text='\n\nDouble bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath.', type='output_text', logprobs=None), sequence_number=340, type='response.content_part.done')
ResponseOutputItemDoneEvent(item=ResponseOutputMessage(id='', content=[ResponseOutputText(annotations=[], text='\n\nDouble bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath double bubble bath.', type='output_text', logprobs=None)], role='assistant', status='completed', type='message', summary=[]), output_index=1, sequence_number=341, type='response.output_item.done')
ResponseCompletedEvent(response=Response(id='resp_42d533df110a4e28b84f051f7839ca58', created_at=1756295396.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='model', object='response', output=[], parallel_tool_calls=True, temperature=0.6, tool_choice='auto', tools=[], top_p=0.95, background=False, max_output_tokens=1000, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=2, truncation='disabled', usage=ResponseUsage(input_tokens=18, input_tokens_details=InputTokensDetails(cached_tokens=16), output_tokens=333, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=351), user=None), sequence_number=342, type='response.completed')

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@kebe7jun kebe7jun force-pushed the feature/responses-api-streaming branch from 8dc2da4 to b65638e Compare August 27, 2025 11:32
@mergify mergify bot added the frontend label Aug 27, 2025
@kebe7jun kebe7jun force-pushed the feature/responses-api-streaming branch from b65638e to 3bb6902 Compare August 27, 2025 11:46
@mergify mergify bot added the v1 label Aug 27, 2025
@kebe7jun kebe7jun marked this pull request as ready for review August 27, 2025 11:55
@kebe7jun kebe7jun requested a review from aarnphm as a code owner August 27, 2025 11:55
@kebe7jun kebe7jun force-pushed the feature/responses-api-streaming branch from 3bb6902 to 6d9fe9c Compare August 28, 2025 00:56
@kebe7jun kebe7jun force-pushed the feature/responses-api-streaming branch from 6d9fe9c to af25d9a Compare August 28, 2025 01:37
@kebe7jun
Copy link
Contributor Author

@heheda12345 PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant