What is the standard formatter of train_dataset in PPOTrainer? #3578

caoyang-sufe · 2025-06-13T15:28:05Z

caoyang-sufe
Jun 13, 2025

I notice that in the TRL ppo-scripts, the formatter function is written as below:

def prepare_dataset(dataset, tokenizer):
    """pre-tokenize the dataset before training; only collate during training"""

    def tokenize(element):
        outputs = tokenizer(
            element[dataset_text_field],
            padding=False,
        )
        return {"input_ids": outputs["input_ids"]}

    return dataset.map(
        tokenize,
        batched=True,
        remove_columns=dataset.column_names,
        num_proc=training_args.dataset_num_proc,
    )

Here it directly transforms a sample to tokenized input-ids, I suggest that this is really unusual, because in the official GRPO demo (https://huggingface.co/docs/trl/grpo_trainer), the dataset still formatted as "prompt" and "completion" columns.

I means PPO is somewhat similiar to GRPO ... for there is no official demo for PPO on huggingface TRL docs

Is it special for PPO? or I can simply follows "prompt" and "completion" format?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the standard formatter of train_dataset in PPOTrainer? #3578

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

What is the standard formatter of train_dataset in PPOTrainer? #3578

Uh oh!

caoyang-sufe Jun 13, 2025

Replies: 0 comments

caoyang-sufe
Jun 13, 2025