Releases: willccbb/verifiers
v0.1.3
Verifiers v0.1.3 Release Notes
Date: 8/26/25
Verifiers v0.1.3 adds a number of features for expanded functionality and ease of use, along with additional library integrations and bug fixes.
Highlights
- We now have a TUI! 🎉 Run
vf-tui
to interactively browse all locally-saved evaluation results in your terminal. - Overhauled logging for
vf-eval
evaluation results with tagged JSON artifact folders.- Defaults to saving in your environment's project directory under
outputs/
if developing locally;./outputs
if using an environment installed from elsewhere. - The short-lived Markdown report outputs are now deprecated.
- Defaults to saving in your environment's project directory under
- Multimodal-input tasks are supported for evaluation (see
environments/mmmu
for an example)! Official trainer support in verifiers is pending, and can be accessed via HUD's hud-vf-gym project. - Optional
async
for reward functions, tools, and Environment class methodsmaybe_await
pattern for safe accommodation of both sync and async functions- Sync extensions of
env_response
andis_completed
in MultiTurnEnv will work, but with a type warning; users are encouraged to migrate these functions to async for ongoing usage.
- Full JSON sampling args in
vf-eval
via-S
(#240). - Official community examples library under very active development: prime-environments
- Native
init
/push
/pull
/install
support in prime-cli (and more...)- Run
uv tool install prime
for a preview 🙂
- Run
- Feature-complete support for training and online evaluations in prime-rl.
- Improved caching and parallelization for JudgeRubric.
Rubric.class_objects
values are available to all reward functions by key name.- Bug fixes for tool call sanitization and saving datasets to Huggingface
- Improvements to documentation.
- From the recent
0.1.2.post1
pre-release version: - New required dependencies since
0.1.2
:rich
,textual
,jinja
.
Thanks to everyone who contributed to this release!
- @lakshyaag (#240, #241)
- @cat-state (#238)
- @qgallouedec (#218, #217)
- @vgel (#201, #196)
- @nathom (#200)
- @snellingio (#195, #194)
- @MarwanMashra (#184)
- @alanxmay
And a special thanks to the entire Prime Intellect team, with PRs this cycle from: - @JannikSt
- @mikasenghaas
- @samsja
Stay tuned for some big announcements in the coming days 😊
Full Changelog: v0.1.2...v0.1.3
v0.1.2.post1
Verifiers v0.1.2.post1 – Release Notes
Incremental update focused on a new stateful tool environment, environment folder cleanup/renaming, math verification robustness, reporting improvements, and bug fixes.
Highlights
- Stateful tools: add a stateful tool environment and move tool JSON loading into environment responses (PR #224).
- Environments: consolidation/renames for clarity and new environment tags (PR #222 and related changes).
- Lazy imports: training-related libraries are only imported when accessed
- Verification: more robust default math verification (PR #213).
- RL support: enable base-model RL with
message_type="completions"
(PR #201), plus Prime-RL integration and docs (PR #204) and GRPO trainer updates (PR #217, #218). - Reporting & endpoints: template/report tweaks and endpoint path loading improvements (PR #206, PR #203, plus follow-ups).
- CLI/UX: make
rich
a default dependency for the eval script (PR #200); eval output refinements. - Fixes: hotfix for sampling args for
gpt-5
.
Changes by Area
CLI and Scripts
- vf-eval
- Hotfixes
- Update sampling args for
gpt-5
(hotfix commit).
- Update sampling args for
Environments and Examples
- Add a stateful tool environment; load tool information via environment responses (PR #224).
- Rename and consolidate environments, introduce tag metadata for discoverability (PR #222; additional env tag updates).
- Math environment updates and prompt tweaks.
- Remove dead processing code in
environment.py
; general cleanup and type hint improvements.
Parsers, Rubrics, and Utils
- Caching improvements for JudgeRubric to reduce redundant work (PR #216).
- More robust rule-based math verification and heuristics (PR #213).
- General type-hint and internal cleanup passes.
Training
- Document Prime-RL training (PR #204).
- Minor updates to GRPO trainer (PR #217, #218).
- Add support for base-model RL flows via
message_type="completions"
(PR #201).
Reporting and Tooling
- Report generation and template tweaks (PR #206, PR #203).
- Improve endpoint path loading and related tooling.
Documentation
- README and docs updates (minor) across environments and training utilities; additional guidance for reporting.
Upgrade Notes
- Environment renames/tags: if you reference environment names or use tags in tooling or scripts, review the updated names and tag metadata (PR #222).
Reference Commits (since v0.1.2.post0)
- adding stateful toolenv, moving tool json loading to env_response (PR #224)
- Will/eval outputs (PR #223)
- Update grpo_trainer.py (PR #217, PR #218)
- hotfix for gpt-5 sampling args
- Will/rename envs (PR #222)
- Will/judgerubric caching (PR #216)
- More robust rule-based math verification (PR #213)
- Report tweaks and endpoints path loading (PR #206 and follow-ups)
- Integrate and document prime-rl training (PR #204)
- Update report generation and vf-init template (PR #203)
- Add support for base model RL /
message_type="completions"
(PR #201) - Add
rich
as default dependency for eval script (PR #200) - Math env updates, prompt tweaks, type hints, and cleanup in
environment.py
Full Changelog
v0.1.2.post0...HEAD
: v0.1.2.post0...HEAD
v0.1.2.post0
Verifiers v0.1.2.post0 – Release Notes
Minor post-release update focusing on polish: CLI script bug fixes and enhancements, environment example cleanup, better reporting, and improved test coverage.
Highlights
- vf-eval: fixed rollout indexing bugs and improved reliability when sampling multiple rollouts.
- vf-init: streamlined project initialization and naming (removed automatic
vf-
prefix) and refreshed templates. - Environments: documentation and prompt cleanups; added/updated AIME examples; improved report embedding.
- Tests: expanded coverage across rubric behavior, XML parser, and environment edge cases.
Changes by Area
CLI and Scripts
- vf-eval
- vf-init
- Remove automatic
vf-
prefix during init to honor provided names (PR #190). - Update README template/content for new environments (multiple small tweaks).
- Remove automatic
Environments and Examples
- AIME 2024 / AIME 2025 updates (PR #199).
- Math Python example: prompt/readme/report cleanups.
- General environment cleanup and README refreshes across multiple examples.
- HotpotQA example: troubleshooting notes and minor fixes.
Parsers, Rubrics, and Utils
- XMLParser: fix handling of string completions during
parse_answer
(PR #196). - Rubric: ensure error-handling behavior is well-covered by tests (PR #195).
- Reporting: improvements to report generation/embedding (
report_utils
). - Dataset helpers: include metrics columns in outputs where expected (PR #194).
Tests
- Increase test coverage for:
- Rubric error handling (PR #195).
- XML parser behavior (new tests).
- Environment edge cases and extra scenarios.
Acknowledgements
Thank you to everyone who contributed to this minor release:
If we missed anyone, thank you as well—your contributions are appreciated.
Upgrade Notes
- No breaking API changes.
- When initializing a new environment with
vf-init
, note the name is now used verbatim (no automaticvf-
prefix, PR #190).
Reference Commits (since v0.1.2)
- Fix XMLParser string completion parsing (PR #196)
- Improve test coverage for Rubric error handling (PR #195)
- Include metrics columns in dataset outputs (PR #194)
- Fix vf-eval rollout index handling (PR #197)
- Remove automatic
vf-
prefix from init (PR #190) - AIME 2024 / 2025 environments updates (PR #199)
- Environment README/reporting cleanups and misc improvements
Full Changelog
v0.1.2
What's changed
With the v0.1.2
release, verifiers
is significantly more production-ready, and stable to build and train with. We appreciate everyone's patience with the changes and bug fixes thus far as we've addressed a number of long-time requests, and are excited to see what you all build with it!
Highlights:
- Proper encapsulation of Environments as standalone modules (see
environments/
), which can contain their own dependencies in apyproject.toml
, and need only to expose aload_environment(...) -> vf.Environment
function in order to be trainable. - Script flows for initializing (
vf-init
), installing (vf-install
), and evaluating (vf-eval
) Environments before training. - Reorganization of examples and training scripts, removing lots of duplicated logic and creating a cleaner separation between library code and example code.
- Deprecation of the manual dynamically-batched
LLM
inference worker in favor of properAsyncLLM
support, allowing full control of native vLLM sampling parameters. - Support for native tool call parsing + parallel tool calls in
ToolEnv
(replacing the manualXMLParser
approach). - Another trainer! Environments built with
verifiers
are now trainable withprime-rl
(as of 58ac91f forv0.1.2
), which supports multi-node FSDP async training, is the primary RL framework used by the Prime Intellect research team, and is under ongoing development and stress-testing in advance of large-scale multi-environment training runs. - Pydantic types for core data classes used by Environments.
- Improvements to
GRPOTrainer
, including supporting a singlemax_seq_len
option (instead of separate prompt + completion lengths), and configurable turn length limits viamax_tokens
. - Many more Environment examples.
- Improved logging and evaluation options.
- Overhauled README.md and docs.