Skip to content

v0.1.3

Latest
Compare
Choose a tag to compare
@willccbb willccbb released this 26 Aug 11:56
· 3 commits to main since this release

Verifiers v0.1.3 Release Notes

Date: 8/26/25

Verifiers v0.1.3 adds a number of features for expanded functionality and ease of use, along with additional library integrations and bug fixes.

Highlights

  • We now have a TUI! πŸŽ‰ Run vf-tui to interactively browse all locally-saved evaluation results in your terminal.
  • Overhauled logging for vf-eval evaluation results with tagged JSON artifact folders.
    • Defaults to saving in your environment's project directory under outputs/ if developing locally; ./outputs if using an environment installed from elsewhere.
    • The short-lived Markdown report outputs are now deprecated.
  • Multimodal-input tasks are supported for evaluation (see environments/mmmu for an example)! Official trainer support in verifiers is pending, and can be accessed via HUD's hud-vf-gym project.
  • Optional async for reward functions, tools, and Environment class methods
    • maybe_await pattern for safe accommodation of both sync and async functions
    • Sync extensions of env_response and is_completed in MultiTurnEnv will work, but with a type warning; users are encouraged to migrate these functions to async for ongoing usage.
  • Full JSON sampling args in vf-eval via -S (#240).
  • Official community examples library under very active development: prime-environments
  • Native init/push/pull/install support in prime-cli (and more...)
    • Run uv tool install prime for a preview πŸ™‚
  • Feature-complete support for training and online evaluations in prime-rl.
  • Improved caching and parallelization for JudgeRubric.
  • Rubric.class_objects values are available to all reward functions by key name.
  • Bug fixes for tool call sanitization and saving datasets to Huggingface
  • Improvements to documentation.
  • From the recent 0.1.2.post1 pre-release version:
    • StatefulToolEnv for intercepting function calls for routing and state management (#224)
    • Improved lazy imports for efficient evaluation.
    • Overhauled MathRubric for math-verify as default reward.
    • Full support restored for completions generation (#201, #196).
  • New required dependencies since 0.1.2: rich, textual, jinja.

Thanks to everyone who contributed to this release!

Stay tuned for some big announcements in the coming days 😊

Full Changelog: v0.1.2...v0.1.3