OnnxRuntime Plugin

This is a fork of the original onnxruntime Flutter plugin, which appears to be no longer maintained. This fork adds support for 16KB memory page size.

Note: macOS is not supported on pub.dev due to their package size limit. If you need macOS support, use this package as a dependency via git: in your pubspec.yaml:
dependencies:
  onnxruntime:
    git:
      url: https://github.com/Persie0/onnxruntime_flutter_1_22_0

OnnxRuntime Plugin

Overview

Flutter plugin for OnnxRuntime via dart:ffi provides an easy, flexible, and fast Dart API to integrate Onnx models in flutter apps across mobile and desktop platforms.

Platform	Android	iOS	Linux	macOS	Windows
Compatibility	API level 21+	*	*	*	*
Architecture	arm32/arm64	*	*	*	*

*: Consistent with Flutter

Key Features

Multi-platform Support for Android, iOS, Linux, macOS, Windows, and Web(Coming soon).
Flexibility to use any Onnx Model.
Acceleration using multi-threading.
Similar structure as OnnxRuntime Java and C# API.
Inference speed is not slower than native Android/iOS Apps built using the Java/Objective-C API.
Run inference in different isolates to prevent jank in UI thread.

Getting Started

In your flutter project add the dependency:

dependencies:
  ...
  onnxruntime: x.y.z

Usage example

Import

import 'package:onnxruntime_v2/onnxruntime_v2.dart';

Initializing environment

OrtEnv.instance.init();

Creating the Session

final sessionOptions = OrtSessionOptions();

// 🚀 NEW: Automatically use GPU acceleration if available!
// This will try GPU providers first, then fall back to CPU
sessionOptions.appendDefaultProviders();

const assetFileName = 'assets/models/test.onnx';
final rawAssetFile = await rootBundle.load(assetFileName);
final bytes = rawAssetFile.buffer.asUint8List();
final session = OrtSession.fromBuffer(bytes, sessionOptions);

Performing inference

final shape = [1, 2, 3];
final inputOrt = OrtValueTensor.createTensorWithDataList(data, shape);
final inputs = {'input': inputOrt};
final runOptions = OrtRunOptions();
final outputs = await _session?.runAsync(runOptions, inputs);
inputOrt.release();
runOptions.release();
outputs?.forEach((element) {
  element?.release();
});

Releasing environment

OrtEnv.instance.release();

🚀 GPU Acceleration

This fork includes full support for GPU and hardware acceleration across multiple platforms!

Supported Execution Providers

Provider	Platform	Hardware	Speedup
CUDA	Windows/Linux	NVIDIA GPU	5-10x
TensorRT	Windows/Linux	NVIDIA GPU	10-20x
DirectML	Windows	AMD/Intel/NVIDIA GPU	3-8x
ROCm	Linux	AMD GPU	5-10x
CoreML	iOS/macOS	Apple Neural Engine	5-15x
NNAPI	Android	Google NPU/GPU	3-7x
OpenVINO	Windows/Linux	Intel GPU/VPU	3-6x
DNNL	All	Intel CPU	2-4x
XNNPACK	All	CPU optimizations	1.5-3x

Quick Start: Automatic GPU Selection

The easiest way to enable GPU acceleration:

final sessionOptions = OrtSessionOptions();
sessionOptions.appendDefaultProviders(); // 🎯 That's it!

This automatically selects the best available provider in this order:

GPU: CUDA → DirectML → ROCm
NPU: CoreML → NNAPI → QNN
Optimized CPU: DNNL → XNNPACK
Fallback: Standard CPU

Manual Provider Selection

For fine-grained control:

// NVIDIA GPU (Windows/Linux)
sessionOptions.appendCudaProvider(CUDAFlags.useArena);

// NVIDIA with TensorRT optimizations + FP16
sessionOptions.appendTensorRTProvider({'trt_fp16_enable': '1'});

// DirectML for Windows (any GPU)
sessionOptions.appendDirectMLProvider();

// Apple Neural Engine (iOS/macOS)
sessionOptions.appendCoreMLProvider(CoreMLFlags.useNone);

// Android acceleration
sessionOptions.appendNnapiProvider(NnapiFlags.useNone);

// AMD GPU on Linux
sessionOptions.appendRocmProvider(ROCmFlags.useArena);

// Intel optimization
sessionOptions.appendDNNLProvider(DNNLFlags.useArena);

// Always add CPU as fallback
sessionOptions.appendCPUProvider(CPUFlags.useArena);

Performance Tips

Use appendDefaultProviders() first - it handles everything automatically
CUDA vs TensorRT: TensorRT is faster but takes longer to initialize
DirectML: Great for cross-vendor support on Windows
Mobile: CoreML (iOS) and NNAPI (Android) provide massive speedups
Thread count: Set setIntraOpNumThreads() to your CPU core count for CPU inference

GPU Setup Requirements

Windows (NVIDIA):

Install CUDA Toolkit
Optional: TensorRT for extra speed

Linux (NVIDIA):

Install CUDA runtime: apt install nvidia-cuda-toolkit
Optional: TensorRT

Linux (AMD):

Install ROCm

Windows (Any GPU):

DirectML works out-of-the-box on Windows 10+

iOS/macOS:

CoreML works automatically (no setup needed)

Android:

NNAPI works automatically on Android 8.1+ (no setup needed)

Troubleshooting

If GPU acceleration isn't working:

Check available providers:

OrtEnv.instance.availableProviders().forEach((provider) {
  print('Available: $provider');
});

Catch provider errors gracefully:

try {
  sessionOptions.appendCudaProvider(CUDAFlags.useArena);
} catch (e) {
  print('CUDA not available, falling back to CPU');
  sessionOptions.appendCPUProvider(CPUFlags.useArena);
}

Verify GPU runtime is installed (CUDA, DirectML, etc.)
Check that you're using the GPU-enabled ONNX Runtime library

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.claude		.claude
android		android
example		example
ios		ios
lib		lib
linux		linux
macos		macos
src/onnxruntime		src/onnxruntime
windows		windows
.gitignore		.gitignore
.metadata		.metadata
.pubignore		.pubignore
CHANGELOG.md		CHANGELOG.md
GPU_ACCELERATION_GUIDE.md		GPU_ACCELERATION_GUIDE.md
LICENSE		LICENSE
README.md		README.md
analysis_options.yaml		analysis_options.yaml
ffigen_onnxruntime.yaml		ffigen_onnxruntime.yaml
makefile		makefile
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

OnnxRuntime Plugin

Overview

Key Features

Getting Started

Usage example

Import

Initializing environment

Creating the Session

Performing inference

Releasing environment

🚀 GPU Acceleration

Supported Execution Providers

Quick Start: Automatic GPU Selection

Manual Provider Selection

Performance Tips

GPU Setup Requirements

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

Deepomatic/onnxruntime_flutter_1_22_0

Folders and files

Latest commit

History

Repository files navigation

OnnxRuntime Plugin

Overview

Key Features

Getting Started

Usage example

Import

Initializing environment

Creating the Session

Performing inference

Releasing environment

🚀 GPU Acceleration

Supported Execution Providers

Quick Start: Automatic GPU Selection

Manual Provider Selection

Performance Tips

GPU Setup Requirements

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages