[Frontend] Gemma3n audio `transcriptions`/`translations` endpoint #23735

NickLucche · 2025-08-27T10:49:21Z

This PR enables Gemma3n for use with the audio-specific endpoints (transcriptions/translations).

I've also added a "soft" interface changes to add a to_language parameter to the API as I found it helps some with translation.
The rationale is that I would like to keep this changes lightweight for now as we're only slightly steering away from the original oai whisper-only specs, and instead see where the broader audio community wants it to be.

No chunking for now, as I believe a long-audio capability assessment is in order for this model.

A list of additional minor changes:

conftest.py for audio entrypoints tests
seed params for translations
whisper+gemma3n audio tests with module-level server fixture

I also plan to follow up with revamped benchmark+evaluation scripts to better cover these models.

# pre 
python -m pytest tests/entrypoints/openai/test_translation_validation.py  146.99s user 22.77s system 154% cpu 1:50.18 total

# post
python -m pytest tests/entrypoints/openai/test_translation_validation.py  243.81s user 31.39s system 146% cpu 3:07.72 total

Signed-off-by: NickLucche <[email protected]>

NickLucche · 2025-08-27T10:50:50Z

cc @DarkLight1337

gemini-code-assist

Code Review

This pull request enables Gemma3n for audio transcription and translation endpoints, which is a great addition. The changes include a soft API modification to add a to_language parameter, which will be useful for future enhancements. The tests have been updated to cover Gemma3n, including parameterization over different models, which is good practice. I've found one issue regarding input validation for the new model implementation that should be addressed.

vllm/model_executor/models/gemma3n_mm.py

tests/entrypoints/openai/test_translation_validation.py

DarkLight1337 · 2025-08-27T11:06:54Z

vllm/model_executor/models/gemma3n_mm.py

+        if task_type == "transcribe" and full_lang_name:
+            prompt += f" into {full_lang_name}"
+        elif task_type == "translate":
+            if full_lang_name:


We should validate that both languages are valid when doing translation

I am assuming languages are validated beforehand here https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/speech_to_text.py#L91.
Do you have some extra checks in mind?

I see, in that case perhaps we should pass the full_lang_name directly into the method?

I also think that we should have a separate function for each task to reduce branching

Signed-off-by: NickLucche <[email protected]>

NickLucche added 3 commits August 25, 2025 15:50

init

8fd3572

Signed-off-by: NickLucche <[email protected]>

to_language interface + seed

54e0f94

Signed-off-by: NickLucche <[email protected]>

tests

79bebe9

Signed-off-by: NickLucche <[email protected]>

NickLucche requested review from patrickvonplaten, DarkLight1337, robertgshaw2-redhat, simon-mo and aarnphm as code owners August 27, 2025 10:49

mergify bot added the frontend label Aug 27, 2025

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

vllm/model_executor/models/gemma3n_mm.py Show resolved Hide resolved

DarkLight1337 reviewed Aug 27, 2025

View reviewed changes

to_language validation

52bab20

Signed-off-by: NickLucche <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Frontend] Gemma3n audio `transcriptions`/`translations` endpoint #23735

[Frontend] Gemma3n audio `transcriptions`/`translations` endpoint #23735

NickLucche commented Aug 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

NickLucche commented Aug 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Aug 27, 2025

Uh oh!

NickLucche Aug 27, 2025

Uh oh!

DarkLight1337 Aug 27, 2025

Uh oh!

DarkLight1337 Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

[Frontend] Gemma3n audio transcriptions/translations endpoint #23735

Are you sure you want to change the base?

[Frontend] Gemma3n audio transcriptions/translations endpoint #23735

Conversation

NickLucche commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickLucche commented Aug 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[Frontend] Gemma3n audio `transcriptions`/`translations` endpoint #23735

[Frontend] Gemma3n audio `transcriptions`/`translations` endpoint #23735

NickLucche commented Aug 27, 2025 •

edited by github-actions bot

Loading