You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Motivation and Context (Why the change? What's the scenario?)
Add option to stream Ask result tokens without waiting for the full
answer to be ready.
## High level description (Approach, Design)
- New `stream` boolean option for the `Ask` API, false by default. When
true, answer tokens are streamed as soon as they are generated by LLMs.
- New `MemoryAnswer.StreamState` enum property: `Error`, `Reset`,
`Append`, `Last`.
- If moderation is enabled, the content is validated at the end. In case
of moderation failure, the service returns an answer with `StreamState`
= `Reset` and the new content to show to the end user.
- Streaming uses SSE message format.
- By default, SSE streams end with a `[DONE]` token. This can be
disabled via KM settings.
- SSE payload is optimized, returning `RelevantSources` only in the
first SSE message.
---------
Co-authored-by: Carlo <[email protected]>
Co-authored-by: Devis Lucato <[email protected]>
Co-authored-by: Devis Lucato <[email protected]>
0 commit comments