Skip to content

Augmentoolkit 3.0

Latest

Choose a tag to compare

@e-p-armstrong e-p-armstrong released this 12 Jun 08:03
· 70 commits to master since this release

Augmentoolkit 3.0 is essentially an entirely new project.

Before we had 3 pipelines. Now we have 16.

Before we just generated data. Now it automatically trains whole LLMs with autogenerated training configs. Datagen can be done locally, efficiently, on consumer hardware, thanks to a custom-trained dataset generation model.

The factual finetuning process's quality has been completely revolutionized during development -- three separate times, each building on the one before it.

A full changelog is impractical, since everything is changed. Every abstraction has been improved. Every way in which the tool is used has been streamlined and improved. Every pipeline is better. Every outcome is higher-quality and more efficiently delivered.

Instead of a changelog, refer to the documentation, since diffs don't mean much when the project has been effectively rewritten from the ground up.

However, if you've forked the project before to build your own data pipelines, do not despair -- porting pipelines to New Augmentoolkit is easy and there is the pipeline conventions, abstractions primer, and new pipeline primer in the documentation (docs/...) to guide you through the process. Alternatively, you can get help on the Discord.

Augmentoolkit is now the best way in the world to make custom data, and by extension, custom models.

Happy Hacking!