Release Augmentoolkit 3.0 · e-p-armstrong/augmentoolkit

Augmentoolkit 3.0 is essentially an entirely new project.

Before we had 3 pipelines. Now we have 16.

Before we just generated data. Now it automatically trains whole LLMs with autogenerated training configs. Datagen can be done locally, efficiently, on consumer hardware, thanks to a custom-trained dataset generation model.

The factual finetuning process's quality has been completely revolutionized during development -- three separate times, each building on the one before it.

A full changelog is impractical, since everything is changed. Every abstraction has been improved. Every way in which the tool is used has been streamlined and improved. Every pipeline is better. Every outcome is higher-quality and more efficiently delivered.

Instead of a changelog, refer to the documentation, since diffs don't mean much when the project has been effectively rewritten from the ground up.

However, if you've forked the project before to build your own data pipelines, do not despair -- porting pipelines to New Augmentoolkit is easy and there is the pipeline conventions, abstractions primer, and new pipeline primer in the documentation (docs/...) to guide you through the process. Alternatively, you can get help on the Discord.

Augmentoolkit is now the best way in the world to make custom data, and by extension, custom models.

Happy Hacking!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Augmentoolkit 3.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!