Gemma 3 Fine tuning max token length #288

mukhayy · 2025-05-02T02:17:01Z

mukhayy
May 2, 2025

Looking to fine tune google/gemma-3-12b-it with my dataset of around 10k examples. But in my dataset the outputs are quiet lengthy (some of them may reach 125k and average being around 60k tokens) so I thought I may take adventage of max_position_embedding= 131072 of this model. But I haven't seen anywhere in examples for fine tuning setting max_seq_length of trl.SFTTrainer as 131072.
Is it smth doable? Or does 131072 only applies for inference? How people should/are approach(ing) fine tuning for lengthy outputs in dataset?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemma 3 Fine tuning max token length #288

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Gemma 3 Fine tuning max token length #288

Uh oh!

Uh oh!

mukhayy May 2, 2025

Replies: 0 comments

mukhayy
May 2, 2025