This repository was archived by the owner on Jun 3, 2025. It is now read-only.
Releases: neuralmagic/sparseml
Releases · neuralmagic/sparseml
SparseML v1.0.0
New Features:
- One-shot and recipe arguments support added for transformers, yolov5, and torchvision.
- Dockerfiles and new build processes created for Docker.
- CLI formats and inclusion standardized on install of SparseML for transformers, yolov5, and torchvision.
- N:M pruning mask creator deployed for use in PyTorch pruning modifiers.
- Masked_language_modeling training CLI added for transformers.
- Documentation additions made across all standard integrations and pathways.
- GitHub action tests running for end-to-end testing of integrations.
Changes:
- Click as a root dependency added as the new preferred route for CLI invocation and arg management.
- Provider parameter added for ONNXRuntime InferenceSessions.
- Moved
onnxruntime
to optional install extra.onnxruntime
no longer a root dependency and will only be imported when using specific pathways. - QAT export pipelines improved with better support for QATMatMul and custom operators.
Resolved Issues:
- Incorrect commands and models updated for older docs for transformers, yolov5, and torchvision.
- YOLOv5 issues addressed with data files, configs, and datasets not being easily accessible with the new install pathway. They are now included in the
sparseml
src folder for yolov5. - An extra batch no longer runs for the PyTorch ModuleRunner.
- None sparsity parameter was being improperly propagated for sparsity in the PyTorch ConstantPruningModifier.
- PyPI dependency conflicts no longer occur with the latest ONNX and Protobuf upgrades.
- When GPUs were not available, yolov5 pathways were not working.
- Transformers export was not working properly when neither
--do_train
or--do_eval arguments
were passed in. - Non-string keys now allowed within recipes.
- Numerous fixes applied for pruning modifiers including improper masks casting, improper initialization, and improper arguments passed through for MFAC.
- YOLOv5 export formatting error addressed.
- Missing or incorrect data corrected for logging and recording statements.
- PyTorch DistillationModifier for transformers was ignoring both "self" distillation and "disable" distillation values; instead, normal distillation would be used.
- FP16 not deactivating on QAT start for torchvision.
Known Issues:
- PyTorch > 1.9 quantized ONNX export is broken; waiting on PyTorch resolution and testing.
SparseML v0.12.2 Patch Release
This is a patch release for 0.12.0 that contains the following changes:
- Protobuf is restricted to version < 4.0 as the newer version breaks ONNX.
SparseML v0.12.1 Patch Release
This is a patch release for 0.12.0 that contains the following changes:
- Disabling of distillation modifiers no longer crashes Hugging Face Transformers integrations
--distillation_teacher disable
- Numeric stability is provided for distillation modifiers using log_softmax instead of softmax.
- Accuracy and performance issues were addressed for quantized graphs in image classification and NLP.
- When using mixed precision for a quantized recipe with image classification, crashes no longer occur.
SparseML v0.12.0
New Features:
- SparseML recipe stages support: recipes can be chained together to enable easier prototyping with compound sparsification.
- SparseML image classification CLIs implemented to enable easy commands for training models like ResNet-50:
sparseml.image_classification.train --help
- FFCV support provided for PyTorch image classification pipelines.
- Masked language modeling CLI added for Hugging Face transformers integration:
sparseml.transformers.masked_language_modeling --help
- DistilBERT support provided for Hugging Face transformers integration.
Changes:
- Modifiers logging upgraded to standardize logging across SparseML and integrations with hyperparameter stores like Weights and Biases.
- Hugging Face Transformers integration updated to the latest state from the main upstream branch.
- Ultralytics YOLOv5 Integration updated to the latest state from the main upstream branch.
- Quantization-aware training graphs updated to enable better recovery and to provide optional support for deployment environments like TensorRT.
Resolved Issues:
- MFAC Pruning modifier multiple minor issues addressed that were preventing proper functioning in recipes leading to exceptions.
- Distillation loss for transformers integration was not calculated correctly when inputs were multidimensional.
- Minor fixes made across modifiers and transformers integration.
Known Issues:
- None
SparseML v0.11.1 Patch Release
This is a patch release for 0.11.0 that contains the following changes:
- Addressed removal of phased, score_type, and global_sparsity flag for PyTorch - GMPruningModifier; rather than crashing, exceptions are only thrown if they are turned on for instances of those modifiers with deprecation notices.
- Crashes no longer occur when using sparseml.transformers training pipelines and distillation modifiers not working without the FP16 training flagged turned on.
SparseML v0.11.0
New Features:
- Hugging Face NLP masked language modeling CLI and support implemented for training and export.
- PyTorch Image classification CLIs deployed.
- WoodFisher/M-FAC pruning algorithm, AC/DC pruning algorithm, and structured pruning algorithm support added with modifiers for PyTorch.
- Reduced precision support provided for quantization in PyTorch (< INT8).
Changes:
- Refactored pruning and quantization algorithms from the
sparseml.torch.optim
package to thesparseml.torch.sparsification
package.
Resolved Issues:
- None
Known Issues:
- None
SparseML v0.10.1 Patch Release
This is a patch release for 0.10.0 that contains the following changes:
- Conversion of Hugging Face BERT models from PyTorch to ONNX no longer drops accuracy, previously ranging from 1-25% depending on the task and dataset.
SparseML v0.10.0
New Features:
- Hugging Face Transformers native integration and CLIs implemented for installation to train transformer models.
- Cyclic LR support added to
LearningRateFunctionModifier
in PyTorch. - ViT (vision transformer) examples added with the
rwightman/timm
integration.
Changes:
- Quantization implementation for BERT models improved (shorter schedules and better recovery).
- PyTorch image classification script saves based on top 1 accuracy now instead of loss.
- Integration
rwightman/timm
updated for ease-of-use withsetup_integration.sh
to set up the environment properly.
Resolved Issues:
- Github actions now triggering for external forks.
Known Issues:
- Conversion of quantized Hugging Face BERT models from PyTorch to ONNX is currently dropping accuracy, ranging from 1-25% depending on the task and dataset. A hotfix is being pursued; users can fall back to version 0.9.0 to prevent the issue.
- Export for masked language modeling with Hugging Face BERT models from PyTorch is currently exporting incorrectly due to a configuration issue. A hotfix is being pursued; users can fall back to version 0.9.0 to prevent the issue.
SparseML v0.9.0
New Features:
dbolya/yolact
integration added with recipes, tutorial, and performant models for the YOLACT segmentation model.- Automatic recipe creation API for pruning recipes added, create_pruning_recipe, along with base class implementations for future expansion of RecipeEditor and RecipeBuilder.
- Structured pruning now supported for channels and filters with StructuredPruningModifier and LayerThinningModifier.
- PyTorch QAT pipelines: added support for automatic fusing of Conv-ReLU blocks, FPN layers, and Convs with shared weights.
- Analyzer implementations for evaluating a model's performance and loss sensitivity to pruning and other algorithms added for ONNX framework.
- Up-to-date version check implemented for SparseML.
Changes:
- Automatically unwrap PyTorch distributed modules so recipes do not need to be changed for distributed pipelines.
- BERT recipes updated to use the distillation modifier.
- References to num_sockets for the DeepSparse engine were removed following the deprecated support for DeepSparse 0.9.
- Changed the block pruning flow to use FourBlockMaskCreator for block sparsity which will not impose any constraint on the divisibility of the channel's dimensions to be pruned on with the block size.
- API docs recompiled.
Resolved Issues:
- Improper variable names corrected that were causing crashes for specific flows in the WoodFisher pruning algorithm.
Known Issues:
- None
SparseML v0.8.0
New Features:
- ONNX benchmarking APIs added.
- QAT and export support added for torch.nn.Embedding layers.
- PyTorch distillation modifier implemented.
- Arithmetic and equation support for recipes added.
- Sparsification oracle available now with initial support for automatic recipe creation.
Changes:
- Torchvision integration and training pipeline reworked to simplify and streamline the codebase.
- Migration of PyTorch modifiers to base classes to be shared across all frameworks.
Resolved Issues:
- None
Known Issues:
- None