What is the standard formatter of train_dataset in PPOTrainer? #3578
Unanswered
caoyang-sufe
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I notice that in the TRL ppo-scripts, the formatter function is written as below:
Here it directly transforms a sample to tokenized input-ids, I suggest that this is really unusual, because in the official GRPO demo (https://huggingface.co/docs/trl/grpo_trainer), the dataset still formatted as "prompt" and "completion" columns.
I means PPO is somewhat similiar to GRPO ... for there is no official demo for PPO on huggingface TRL docs
Is it special for PPO? or I can simply follows "prompt" and "completion" format?
Beta Was this translation helpful? Give feedback.
All reactions