This is a fork of the original onnxruntime Flutter plugin, which appears to be no longer maintained. This fork adds support for 16KB memory page size.
Note: macOS is not supported on pub.dev due to their package size limit. If you need macOS support, use this package as a dependency via
git:in yourpubspec.yaml:dependencies: onnxruntime: git: url: https://github.com/Persie0/onnxruntime_flutter_1_22_0
Flutter plugin for OnnxRuntime via dart:ffi provides an easy, flexible, and fast Dart API to integrate Onnx models in flutter apps across mobile and desktop platforms.
| Platform | Android | iOS | Linux | macOS | Windows |
|---|---|---|---|---|---|
| Compatibility | API level 21+ | * | * | * | * |
| Architecture | arm32/arm64 | * | * | * | * |
- Multi-platform Support for Android, iOS, Linux, macOS, Windows, and Web(Coming soon).
- Flexibility to use any Onnx Model.
- Acceleration using multi-threading.
- Similar structure as OnnxRuntime Java and C# API.
- Inference speed is not slower than native Android/iOS Apps built using the Java/Objective-C API.
- Run inference in different isolates to prevent jank in UI thread.
In your flutter project add the dependency:
dependencies:
...
onnxruntime: x.y.zimport 'package:onnxruntime_v2/onnxruntime_v2.dart';OrtEnv.instance.init();final sessionOptions = OrtSessionOptions();
// 🚀 NEW: Automatically use GPU acceleration if available!
// This will try GPU providers first, then fall back to CPU
sessionOptions.appendDefaultProviders();
const assetFileName = 'assets/models/test.onnx';
final rawAssetFile = await rootBundle.load(assetFileName);
final bytes = rawAssetFile.buffer.asUint8List();
final session = OrtSession.fromBuffer(bytes, sessionOptions);final shape = [1, 2, 3];
final inputOrt = OrtValueTensor.createTensorWithDataList(data, shape);
final inputs = {'input': inputOrt};
final runOptions = OrtRunOptions();
final outputs = await _session?.runAsync(runOptions, inputs);
inputOrt.release();
runOptions.release();
outputs?.forEach((element) {
element?.release();
});OrtEnv.instance.release();This fork includes full support for GPU and hardware acceleration across multiple platforms!
| Provider | Platform | Hardware | Speedup |
|---|---|---|---|
| CUDA | Windows/Linux | NVIDIA GPU | 5-10x |
| TensorRT | Windows/Linux | NVIDIA GPU | 10-20x |
| DirectML | Windows | AMD/Intel/NVIDIA GPU | 3-8x |
| ROCm | Linux | AMD GPU | 5-10x |
| CoreML | iOS/macOS | Apple Neural Engine | 5-15x |
| NNAPI | Android | Google NPU/GPU | 3-7x |
| OpenVINO | Windows/Linux | Intel GPU/VPU | 3-6x |
| DNNL | All | Intel CPU | 2-4x |
| XNNPACK | All | CPU optimizations | 1.5-3x |
The easiest way to enable GPU acceleration:
final sessionOptions = OrtSessionOptions();
sessionOptions.appendDefaultProviders(); // 🎯 That's it!This automatically selects the best available provider in this order:
- GPU: CUDA → DirectML → ROCm
- NPU: CoreML → NNAPI → QNN
- Optimized CPU: DNNL → XNNPACK
- Fallback: Standard CPU
For fine-grained control:
// NVIDIA GPU (Windows/Linux)
sessionOptions.appendCudaProvider(CUDAFlags.useArena);
// NVIDIA with TensorRT optimizations + FP16
sessionOptions.appendTensorRTProvider({'trt_fp16_enable': '1'});
// DirectML for Windows (any GPU)
sessionOptions.appendDirectMLProvider();
// Apple Neural Engine (iOS/macOS)
sessionOptions.appendCoreMLProvider(CoreMLFlags.useNone);
// Android acceleration
sessionOptions.appendNnapiProvider(NnapiFlags.useNone);
// AMD GPU on Linux
sessionOptions.appendRocmProvider(ROCmFlags.useArena);
// Intel optimization
sessionOptions.appendDNNLProvider(DNNLFlags.useArena);
// Always add CPU as fallback
sessionOptions.appendCPUProvider(CPUFlags.useArena);- Use
appendDefaultProviders()first - it handles everything automatically - CUDA vs TensorRT: TensorRT is faster but takes longer to initialize
- DirectML: Great for cross-vendor support on Windows
- Mobile: CoreML (iOS) and NNAPI (Android) provide massive speedups
- Thread count: Set
setIntraOpNumThreads()to your CPU core count for CPU inference
Windows (NVIDIA):
- Install CUDA Toolkit
- Optional: TensorRT for extra speed
Linux (NVIDIA):
- Install CUDA runtime:
apt install nvidia-cuda-toolkit - Optional: TensorRT
Linux (AMD):
- Install ROCm
Windows (Any GPU):
- DirectML works out-of-the-box on Windows 10+
iOS/macOS:
- CoreML works automatically (no setup needed)
Android:
- NNAPI works automatically on Android 8.1+ (no setup needed)
If GPU acceleration isn't working:
- Check available providers:
OrtEnv.instance.availableProviders().forEach((provider) {
print('Available: $provider');
});- Catch provider errors gracefully:
try {
sessionOptions.appendCudaProvider(CUDAFlags.useArena);
} catch (e) {
print('CUDA not available, falling back to CPU');
sessionOptions.appendCPUProvider(CPUFlags.useArena);
}-
Verify GPU runtime is installed (CUDA, DirectML, etc.)
-
Check that you're using the GPU-enabled ONNX Runtime library
