Skip to content

Conversation

@davidef
Copy link
Contributor

@davidef davidef commented Aug 18, 2025

With 3+ concurrent requests sending back-to-back tasks, one (or more if you add more concurrent requests) get stuck in the deferred queue almost until you stop sending new requests.

When a previous task is completed and pop_deferred_task is called there is already a newly received task in queue_tasks so the task poped from queue_tasks_deferred queue will be second in queue_tasks so it will later deferred again (as last) in queue_tasks_deferred queue.
In this PR we change it so that poped task is at front of queue_tasks_deferred so will be the one executed next.

@davidef davidef requested a review from ngxson as a code owner August 18, 2025 14:34
@ggerganov ggerganov merged commit d1d8241 into ggml-org:master Aug 18, 2025
46 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants