Skip to content

Conversation

NickLucche
Copy link
Contributor

@NickLucche NickLucche commented Aug 27, 2025

This PR enables Gemma3n for use with the audio-specific endpoints (transcriptions/translations).

I've also added a "soft" interface changes to add a to_language parameter to the API as I found it helps some with translation.
The rationale is that I would like to keep this changes lightweight for now as we're only slightly steering away from the original oai whisper-only specs, and instead see where the broader audio community wants it to be.

No chunking for now, as I believe a long-audio capability assessment is in order for this model.

A list of additional minor changes:

  • conftest.py for audio entrypoints tests
  • seed params for translations
  • whisper+gemma3n audio tests with module-level server fixture

I also plan to follow up with revamped benchmark+evaluation scripts to better cover these models.

# pre 
python -m pytest tests/entrypoints/openai/test_translation_validation.py  146.99s user 22.77s system 154% cpu 1:50.18 total

# post
python -m pytest tests/entrypoints/openai/test_translation_validation.py  243.81s user 31.39s system 146% cpu 3:07.72 total

Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
@NickLucche
Copy link
Contributor Author

cc @DarkLight1337

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables Gemma3n for audio transcription and translation endpoints, which is a great addition. The changes include a soft API modification to add a to_language parameter, which will be useful for future enhancements. The tests have been updated to cover Gemma3n, including parameterization over different models, which is good practice. I've found one issue regarding input validation for the new model implementation that should be addressed.

if task_type == "transcribe" and full_lang_name:
prompt += f" into {full_lang_name}"
elif task_type == "translate":
if full_lang_name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should validate that both languages are valid when doing translation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming languages are validated beforehand here https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/speech_to_text.py#L91.
Do you have some extra checks in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, in that case perhaps we should pass the full_lang_name directly into the method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that we should have a separate function for each task to reduce branching

Signed-off-by: NickLucche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants