26 Aug 11:56

willccbb

2106820

v0.1.3 Latest

Latest

Verifiers v0.1.3 Release Notes

Date: 8/26/25

Verifiers v0.1.3 adds a number of features for expanded functionality and ease of use, along with additional library integrations and bug fixes.

Highlights

We now have a TUI! 🎉 Run vf-tui to interactively browse all locally-saved evaluation results in your terminal.
Overhauled logging for vf-eval evaluation results with tagged JSON artifact folders.
- Defaults to saving in your environment's project directory under outputs/ if developing locally; ./outputs if using an environment installed from elsewhere.
- The short-lived Markdown report outputs are now deprecated.
Multimodal-input tasks are supported for evaluation (see environments/mmmu for an example)! Official trainer support in verifiers is pending, and can be accessed via HUD's hud-vf-gym project.
Optional async for reward functions, tools, and Environment class methods
- maybe_await pattern for safe accommodation of both sync and async functions
- Sync extensions of env_response and is_completed in MultiTurnEnv will work, but with a type warning; users are encouraged to migrate these functions to async for ongoing usage.
Full JSON sampling args in vf-eval via -S (#240).
Official community examples library under very active development: prime-environments
Native init/push/pull/install support in prime-cli (and more...)
- Run uv tool install prime for a preview 🙂
Feature-complete support for training and online evaluations in prime-rl.
Improved caching and parallelization for JudgeRubric.
Rubric.class_objects values are available to all reward functions by key name.
Bug fixes for tool call sanitization and saving datasets to Huggingface
Improvements to documentation.
From the recent 0.1.2.post1 pre-release version:
- StatefulToolEnv for intercepting function calls for routing and state management (#224)
- Improved lazy imports for efficient evaluation.
- Overhauled MathRubric for math-verify as default reward.
- Full support restored for completions generation (#201, #196).
New required dependencies since 0.1.2: rich, textual, jinja.

Thanks to everyone who contributed to this release!

@lakshyaag (#240, #241)
@cat-state (#238)
@qgallouedec (#218, #217)
@vgel (#201, #196)
@nathom (#200)
@snellingio (#195, #194)
@MarwanMashra (#184)
@alanxmay
And a special thanks to the entire Prime Intellect team, with PRs this cycle from:
@JannikSt
@mikasenghaas
@samsja

Stay tuned for some big announcements in the coming days 😊

Full Changelog: v0.1.2...v0.1.3

Contributors

vgel, snellingio, and 9 other contributors

Assets 2

23 Aug 08:17

willccbb

v0.1.2.post1

ed09d63

v0.1.2.post1

Verifiers v0.1.2.post1 – Release Notes

Incremental update focused on a new stateful tool environment, environment folder cleanup/renaming, math verification robustness, reporting improvements, and bug fixes.

Highlights

Stateful tools: add a stateful tool environment and move tool JSON loading into environment responses (PR #224).
Environments: consolidation/renames for clarity and new environment tags (PR #222 and related changes).
Lazy imports: training-related libraries are only imported when accessed
Verification: more robust default math verification (PR #213).
RL support: enable base-model RL with message_type="completions" (PR #201), plus Prime-RL integration and docs (PR #204) and GRPO trainer updates (PR #217, #218).
Reporting & endpoints: template/report tweaks and endpoint path loading improvements (PR #206, PR #203, plus follow-ups).
CLI/UX: make rich a default dependency for the eval script (PR #200); eval output refinements.
Fixes: hotfix for sampling args for gpt-5.

Changes by Area

CLI and Scripts

vf-eval
- Add rich as a default dependency to improve output readability (PR #200).
- Refine eval outputs and result handling (PR #223 and related commits).
Hotfixes
- Update sampling args for gpt-5 (hotfix commit).

Environments and Examples

Add a stateful tool environment; load tool information via environment responses (PR #224).
Rename and consolidate environments, introduce tag metadata for discoverability (PR #222; additional env tag updates).
Math environment updates and prompt tweaks.
Remove dead processing code in environment.py; general cleanup and type hint improvements.

Parsers, Rubrics, and Utils

Caching improvements for JudgeRubric to reduce redundant work (PR #216).
More robust rule-based math verification and heuristics (PR #213).
General type-hint and internal cleanup passes.

Training

Document Prime-RL training (PR #204).
Minor updates to GRPO trainer (PR #217, #218).
Add support for base-model RL flows via message_type="completions" (PR #201).

Reporting and Tooling

Report generation and template tweaks (PR #206, PR #203).
Improve endpoint path loading and related tooling.

Documentation

README and docs updates (minor) across environments and training utilities; additional guidance for reporting.

Upgrade Notes

Environment renames/tags: if you reference environment names or use tags in tooling or scripts, review the updated names and tag metadata (PR #222).

Reference Commits (since v0.1.2.post0)

adding stateful toolenv, moving tool json loading to env_response (PR #224)
Will/eval outputs (PR #223)
Update grpo_trainer.py (PR #217, PR #218)
hotfix for gpt-5 sampling args
Will/rename envs (PR #222)
Will/judgerubric caching (PR #216)
More robust rule-based math verification (PR #213)
Report tweaks and endpoints path loading (PR #206 and follow-ups)
Integrate and document prime-rl training (PR #204)
Update report generation and vf-init template (PR #203)
Add support for base model RL / message_type="completions" (PR #201)
Add rich as default dependency for eval script (PR #200)
Math env updates, prompt tweaks, type hints, and cleanup in environment.py

Full Changelog

v0.1.2.post0...HEAD: v0.1.2.post0...HEAD

Assets 2

09 Aug 00:27

willccbb

v0.1.2.post0

a3ce9d3

v0.1.2.post0

Verifiers v0.1.2.post0 – Release Notes

Minor post-release update focusing on polish: CLI script bug fixes and enhancements, environment example cleanup, better reporting, and improved test coverage.

Highlights

vf-eval: fixed rollout indexing bugs and improved reliability when sampling multiple rollouts.
vf-init: streamlined project initialization and naming (removed automatic vf- prefix) and refreshed templates.
Environments: documentation and prompt cleanups; added/updated AIME examples; improved report embedding.
Tests: expanded coverage across rubric behavior, XML parser, and environment edge cases.

Changes by Area

CLI and Scripts

vf-eval
- Fix index handling when using multiple rollouts (PR #197).
- Ensure metrics columns are included in generated datasets via supporting utilities (PR #194).
vf-init
- Remove automatic vf- prefix during init to honor provided names (PR #190).
- Update README template/content for new environments (multiple small tweaks).

Environments and Examples

AIME 2024 / AIME 2025 updates (PR #199).
Math Python example: prompt/readme/report cleanups.
General environment cleanup and README refreshes across multiple examples.
HotpotQA example: troubleshooting notes and minor fixes.

Parsers, Rubrics, and Utils

XMLParser: fix handling of string completions during parse_answer (PR #196).
Rubric: ensure error-handling behavior is well-covered by tests (PR #195).
Reporting: improvements to report generation/embedding (report_utils).
Dataset helpers: include metrics columns in outputs where expected (PR #194).

Tests

Increase test coverage for:
- Rubric error handling (PR #195).
- XML parser behavior (new tests).
- Environment edge cases and extra scenarios.

Acknowledgements

Thank you to everyone who contributed to this minor release:

If we missed anyone, thank you as well—your contributions are appreciated.

Upgrade Notes

No breaking API changes.
When initializing a new environment with vf-init, note the name is now used verbatim (no automatic vf- prefix, PR #190).

Reference Commits (since v0.1.2)

Fix XMLParser string completion parsing (PR #196)
Improve test coverage for Rubric error handling (PR #195)
Include metrics columns in dataset outputs (PR #194)
Fix vf-eval rollout index handling (PR #197)
Remove automatic vf- prefix from init (PR #190)
AIME 2024 / 2025 environments updates (PR #199)
Environment README/reporting cleanups and misc improvements

Full Changelog

v0.1.2...v0.1.2.post0

Assets 2

31 Jul 02:34

willccbb

v0.1.2

21db7ac

v0.1.2

What's changed

With the v0.1.2 release, verifiers is significantly more production-ready, and stable to build and train with. We appreciate everyone's patience with the changes and bug fixes thus far as we've addressed a number of long-time requests, and are excited to see what you all build with it!

Highlights:

Proper encapsulation of Environments as standalone modules (see environments/), which can contain their own dependencies in a pyproject.toml, and need only to expose a load_environment(...) -> vf.Environment function in order to be trainable.
Script flows for initializing (vf-init), installing (vf-install), and evaluating (vf-eval) Environments before training.
Reorganization of examples and training scripts, removing lots of duplicated logic and creating a cleaner separation between library code and example code.
Deprecation of the manual dynamically-batched LLM inference worker in favor of proper AsyncLLM support, allowing full control of native vLLM sampling parameters.
Support for native tool call parsing + parallel tool calls in ToolEnv (replacing the manual XMLParser approach).
Another trainer! Environments built with verifiers are now trainable with prime-rl (as of 58ac91f for v0.1.2), which supports multi-node FSDP async training, is the primary RL framework used by the Prime Intellect research team, and is under ongoing development and stress-testing in advance of large-scale multi-environment training runs.
Pydantic types for core data classes used by Environments.
Improvements to GRPOTrainer, including supporting a single max_seq_len option (instead of separate prompt + completion lengths), and configurable turn length limits via max_tokens.
Many more Environment examples.
Improved logging and evaluation options.
Overhauled README.md and docs.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Verifiers v0.1.3 Release Notes