EMNLP 2025 🏅 Oral Presentation (Top 50% of accepted papers, ARR besst paper nomination)
- 🤖 Oct 2025: Llama-Fin-8b model released on 🤗 Hugging Face - state-of-the-art financial LLM!
- 🏋️ Oct 2025: FinTrain training dataset released on 🤗 Hugging Face - comprehensive training data for financial LLMs!
- 🎉 Sep 2025: Check our Post-training Research Hub for comprehensive resources including FinDAP, tutorials, RAG, and continual pre-training!
- 🔧 Sep 2025: FinRec training code is now available! Train your own domain-specific financial LLMs with our proven recipes.
- 📊 Jan 2025: FinEval benchmark released on 🤗 Hugging Face - comprehensive evaluation suite for financial LLMs!
Given a pre-trained LLM with strong general capabilities (e.g., Llama3-8b-instruct), how can we effectively adapt it to a target domain (e.g., finance) through post-training?
Key Questions We Address:
- ❓ What criteria are desirable for successful adaptation?
- ❓ What are the most effective training recipes with respect to data and model?
- ❓ How do different post-training stages contribute to domain expertise?
FinDAP is a comprehensive finance-specific post-training framework that includes:
- 🎯 Systematic capability identification for financial LLMs
- 📊 State-of-the-art evaluation framework (FinEval)
- 🔧 Advanced training recipes with novel preference alignment
- 🏆 High-performance model checkpoints (Llama-Fin)
We use the finance domain as a case study to demonstrate effective domain-adaptive post-training on instruction-tuned LLMs.
Comprehensive systematic approach to financial LLM domain adaptation
| 💡 Contribution | 📋 Description |
|---|---|
| 📊 Comprehensive Guidance | Complete framework for finance-specific post-training including capability identification, evaluation, data and model recipe design |
| 🔬 Systematic Exploration | In-depth analysis of each post-training stage with emphasis on goals, challenges and effective approaches |
| 📈 Novel Preference Alignment | Revolutionary approach using on-policy trajectories guided by both outcome and process signals |
| 💡 State-of-the-art Financial LLM | Llama-Fin model achieving SOTA performance at 8B parameter scale |
Our framework consists of four key components:
| Component | Description | Focus |
|---|---|---|
| 🎯 FinCap | Core Capabilities Framework | Systematic identification of financial LLM capabilities |
| 🔧 FinRec | Training Recipe & Methodology | Advanced training strategies and preference alignment |
| 📚 FinTrain | Curated Training Data | Systematically curated datasets for optimal adaptation |
| 📊 FinEval | Comprehensive Evaluation Suite | Multi-dimensional evaluation framework |
We systematically identify the essential capabilities for domain-specific LLMs, focusing on four fundamental areas that enable effective financial domain adaptation:
| 🎯 Capability | 📋 Description | 💡 Example |
|---|---|---|
| 🏗️ Domain Concepts | Understanding financial terminology and domain-specific knowledge | 'Bond' as a loan agreement between investor and borrower |
| 📈 Domain Tasks | Executing finance-specific tasks and applications | Stock movement prediction, financial report analysis |
| 🧠 Reasoning | Mathematical calculations and logical inference for complex problems | Computing market rates, earnings per share analysis |
| 💬 Instruction Following | Understanding and executing financial task instructions | Following trading instructions, Q&A about financial concepts |
🏗️ Domain-Specific Concepts
Financial terminology and domain knowledge form the foundation of expertise.
Financial domains include specialized concepts that differ significantly from general usage. For example:
- 📊 Bond: A loan agreement between an investor and borrower
- 📈 Volatility: A statistical measure of price fluctuation dispersion
- 🏛️ Derivatives: Financial contracts deriving value from underlying assets
Key Challenge: Adapt to domain-specific concepts while preserving general knowledge essential for both domain-specific and general tasks.
📈 Domain-Specific Tasks
Specialized tasks unique to the financial domain.
While many NLP tasks span domains, finance has unique requirements:
- 🔮 Stock Movement Detection: Predicting market trends
- 📊 Risk Assessment: Evaluating investment risks
- 💰 Portfolio Optimization: Strategic asset allocation
Key Challenge: Leverage domain concepts to solve tailored tasks effectively while maintaining broad task competency.
🧠 Advanced Reasoning
Mathematical and logical reasoning for complex financial analysis.
Financial tasks require sophisticated reasoning capabilities:
- 🧮 Mathematical Reasoning: Computing financial ratios, valuations
- 🔍 Analytical Thinking: Interpreting market trends, company performance
- 📈 Quantitative Analysis: Processing numerical data and metrics
Key Challenge: Perform complex mathematical reasoning while maintaining accuracy in financial calculations and interpretations.
💬 Instruction Following & Communication
Core capability for both general and domain-specific interactions.
Essential for practical deployment:
- 📝 Task Understanding: Interpreting financial instructions accurately
- 🗣️ Conversational Interface: Natural dialogue about financial topics
- 🎯 Goal-Oriented Response: Providing actionable financial insights
Key Challenge: Maintain natural conversation flow while providing accurate, domain-appropriate responses.
📝 Note: While domains may vary in sensitivity (e.g., medical vs. entertainment) and multi-modality requirements, we focus on these four core capabilities as the foundation for effective domain adaptation. Future work may explore additional aspects such as multi-modal integration and domain-specific ethical considerations.
FinRec provides our complete training methodology for domain-adaptive post-training, featuring joint optimization of continual pre-training and instruction tuning, plus novel preference alignment techniques.
Our training recipe consists of three progressive stages:
| 🎯 Stage | 📋 Components | 🔍 Purpose |
|---|---|---|
| Stage 1 | Joint CPT + SFT | Simultaneous domain knowledge acquisition and instruction following |
| Stage 2 | Curriculum Learning | Progressive difficulty scaling with multiple curriculum groups |
| Stage 3 | Offline RL | Preference alignment using outcome and process signals |
Train your own financial LLM with FinRec in 3 steps!
# Create and activate conda environment
conda create -n FinDAP python=3.10 && conda activate FinDAP
# Install dependencies
pip install -r requirements.txtJoint Continual Pre-training and Supervised Fine-tuning with curriculum learning:
🥇 Curriculum Group 1: Foundation Training
Starting from base model (e.g., Llama3-8B-Instruct):
# Foundation curriculum - basic financial concepts and tasks
./scripts/cpt_sft/mix_cpt_mix_sft_extend_book_exercise_downsample_from_base.shKey Features:
- 📚 Mixed CPT: Financial texts + general domain retention
- 🎯 Mixed SFT: Basic instruction following + domain tasks
- 📖 Extended Books: Financial literature and educational content
- 🔄 Downsampling: Balanced data distribution
🥈 Curriculum Group 2: Advanced Training
Building on Group 1 results:
# Advanced curriculum - complex reasoning and specialized tasks
./scripts/cpt_sft/mix_cpt_mix_sft_extend_book_exercise_downsample_from_v1.shKey Features:
- 🧠 Advanced Reasoning: Complex financial calculations and analysis
- 💼 Specialized Tasks: Professional-level financial applications
- 🔗 Sequential Learning: Builds upon previous stage knowledge
- ⚡ Optimized Data Mix: Refined data proportions for advanced capabilities
Revolutionary preference learning using both outcome and process signals:
# Offline RL with final answer preference and stepwise corrective preference
./scripts/offline_rl/rpo_cfa_stepwise.shNovel Features:
- 🎯 Dual Signal Learning: Outcome-based + process-based preference optimization
- 🔄 Stepwise Correction: Fine-grained error correction during reasoning
- 🤖 Generative Reward Model: On-policy trajectory construction
- 📈 RPO Algorithm: Robust Policy Optimization for financial domain
| 💡 Innovation | 📋 Description | 🎯 Benefit |
|---|---|---|
| Joint CPT+SFT | Simultaneous knowledge and instruction optimization | Prevents catastrophic forgetting |
| Curriculum Design | Progressive complexity scaling | Improved learning stability |
| Dual Preference | Outcome + process signal alignment | Enhanced reasoning accuracy |
| Stepwise Correction | Granular error identification and fixing | Better mathematical reasoning |
The culmination of our FinDAP framework is Llama-Fin, a state-of-the-art financial LLM that achieves exceptional performance across diverse financial tasks.
- 🥇 State-of-the-art Performance: Leading results on financial benchmarks
- 🎯 8B Parameter Efficiency: Optimal balance of performance and computational efficiency
- 🧠 Multi-capability Excellence: Strong performance across concepts, tasks, reasoning, and instruction following
- 📈 Novel Contributions: Demonstrates effectiveness of dual preference learning and joint CPT+SFT training
📊 Performance Note: Detailed results and comparisons available in our paper and evaluation using FinEval.
Our evaluation framework provides a comprehensive assessment of the four core capabilities through carefully curated development and held-out evaluation sets.
Figure 2: FinEval evaluation framework • Comprehensive assessment across multiple dimensions and task types
New datasets released with FinDAP are highlighted
Our evaluation framework assesses models across multiple orthogonal dimensions:
| 🎯 Dimension | 📋 Categories | 🔍 Purpose |
|---|---|---|
| 🔄 Task Types | Similar (seen) • Novel (unseen) | Assess generalization to new task categories |
| 🎯 Task Categories | General • Domain-Specific • Reasoning | Evaluate different skill requirements |
| 📝 Evaluation Methods | Direct Answer • Chain-of-Thought | Test reasoning transparency and accuracy |
- ✅ Multi-dimensional Assessment: Orthogonal evaluation across task types, categories, and methods
- ✅ Development & Held-out Sets: Proper train/test split for reliable evaluation
- ✅ Novel Task Generalization: Assessment on completely unseen task categories
- ✅ Reasoning Evaluation: Both direct answers and step-by-step reasoning assessment
- ✅ Comprehensive Coverage: Aligned with FinCap capabilities framework
Get started with FinEval in 3 easy steps!
FinEval integrates seamlessly with standard evaluation frameworks. We recommend using LLM Evaluation Harness for effortless dataset loading and evaluation.
# Use existing FinDAP environment or create new one
conda create -n FinDAP python=3.10 && conda activate FinDAP
# Clone the evaluation harness
git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
# Install additional evaluation dependencies
pip install datasets transformers vllmCreate a task configuration file (example: cfa-challenge.yaml):
task: cfa-challenge
dataset_path: Salesforce/FinEval
dataset_name: CFA-Challenge
output_type: generate_until
test_split: test
doc_to_text: query
doc_to_target: answer
should_decontaminate: true
doc_to_decontamination_query: query
generation_kwargs:
until:
- "</s>"
- "<|im_end|>"
- "<|eot_id|>"
- "<|end_of_text|>"
- "<|end|>"
- "<|endoftext|>"
do_sample: false
temperature: 0.0
max_gen_toks: 8000
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
metadata:
version: 1.0# Login to Hugging Face
huggingface-cli login --token {YOUR_HF_TOKEN}
# Set environment variables
export HF_DATASETS_CACHE={YOUR_CACHE_LOC}
export TRANSFORMERS_CACHE={YOUR_CACHE_LOC}
export TRUST_REMOTE_CODE=1
# Configure evaluation
system_prompt="Please act as a CFA exam taker and evaluate the given scenario to choose the most appropriate answer from options A, B, and C. Start by offering a brief explanation of your thought process and reasoning, up to 100 words. After the explanation, select your answer using the format: 'Selection: [[A]]' (e.g., 'Explanation: (your explanation)\nSelection: [[A]]'). If you find no answer is correct, directly mention it"
model="Salesforce/Llama-Fin-8b"
# Run evaluation
lm_eval --apply_chat_template --model vllm --log_samples --write_out \
--model_args pretrained=${model},max_length=8000,dtype=bfloat16,trust_remote_code=True,tensor_parallel_size={YOUR_NUM_GPU},gpu_memory_utilization=0.6 \
--system_instruction "$system_prompt" \
--tasks cfa-challenge \
--device cuda \
--output_path {YOUR_OUTPUT_LOC} \
--batch_size auto \
--num_fewshot 0🔬 Research Purpose Only: This release supports academic research as described in our EMNLP 2025 paper.
This release is for research purposes only in support of an academic paper. Our datasets and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before model deployment. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our Salesforce AUP and AI AUP
If you find our project helpful, please consider citing our paper 😊
@misc{ke2025demystifyingdomainadaptiveposttrainingfinancial,
title={Demystifying Domain-adaptive Post-training for Financial LLMs},
author={Zixuan Ke and Yifei Ming and Xuan-Phi Nguyen and Caiming Xiong and Shafiq Joty},
year={2025},
eprint={2501.04961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.04961},
}Built with ❤️ by the Salesforce AI Research team
Feel free to contact Zixuan Ke via email: [email protected]
