Skip to content

Enhance calculation of tokens number in response based on request's max_tokens parameter #136

@mayabar

Description

@mayabar

Current behavior:
If max_tokens is provided in the request - number of tokens in response will be always equal to max_tokens

Expected behavior:
Create shorter outputs for part of requests.
Find relevant statistics, in which part of requests response should be shorter than max_tokens, and how to calculate response's length

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions