This repo contains the code for our EMNLP25 paper Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem.
One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.
- Unleashes Reasoning with One Example: One-Shot CFT uses critiques of diverse model-generated solutions to a single problem to significantly boost performance across math and logic tasks. For example, with just 5 GPU hours of training on Qwen2.5-Math-7B, One-Shot CFT achieves an average improvement of +15% on six math benchmarks and +16% on three logic reasoning benchmarks.
 - Outperforms RLVR and Full SFT with 20× Less Compute: One-Shot CFT outperforms both one-shot Reinforcement Learning with Verifiable Rewards (RLVR) and full-dataset supervised fine-tuning, while requiring only 5 GPU hours on a 7B model—offering a much more efficient and stable training alternative.
 - Robust Across Seeds and Model Scales: One-Shot CFT remains effective across different seed problem choices and model sizes—from 1.5B to 14B parameters—demonstrating strong generalization and scalability.
 
cd tools/
bash setup_env.shbash prepare_data.sh- Train on Mathematical Reasoning
 
cd ../train/
bash train_on_math_reasoning.shWe randomly select 500 math problems (excluding MATH-500) for validation. To validate after training:
cd train/Validation
bash start_validate.shThis generates validation_summary.txt containing MATH-Validation scores per checkpoint. Select the checkpoint with the highest score as your final model.
- Train on Logic Reasoning
 
cd ../train/
bash train_on_logic_reasoning.shWe do not use a separate validation set for logic tasks. Based on our experiments, checkpoints between ckpt-30 and ckpt-40 generally yield the best performance.
Edit the following scripts with your trained model path and output directory:
- eval/eval_on_math_reasoning.sh
 - eval/eval_on_logic_reasoning.sh
 
Then run:
cd eval/
bash eval_on_math_reasoning.sh
bash eval_on_logic_reasoning.shOur evaluation code is based on Qwen2.5-Math and BBEH.
You can create new critique data using the prompt templates in "prompts/" for:
- Candidate solution generation
 - Teacher critique generation
 
Cite our paper as
@article{wang2025unleashing,
  title={Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem},
  author={Wang, Yubo and Nie, Ping and Zou, Kai and Wu, Lijun and Chen, Wenhu},
  journal={arXiv preprint arXiv:2506.03295},
  year={2025}
}