Skip to content

Change time-to-first-token parameter to be based on number of request tokens #137

@mayabar

Description

@mayabar

Current state
TTFT is defined as a single value , not based on number of request tokens

Required state
Add a new parameter which defines prefill time per token instead of TTFT

Same for kv-cache-transfer-latency

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions