BGE-M3 ONNX

This repository demonstrates how to convert the complete BGE-M3 model to ONNX format and use it in multiple programming languages with full multi-vector functionality.

Key Features

Generate all three BGE-M3 embedding types: dense, sparse, and ColBERT vectors
Reduced latency with local embedding generation
Full control over the embedding pipeline with no external dependencies
Works offline without internet connectivity requirements
Cross-platform compatibility (C#, Java, Python)
CUDA GPU acceleration support

Repository Structure

bge-m3-to-onnx.ipynb - Jupyter notebook demonstrating the BGE-M3 conversion process
/samples/dotnet - C# implementation
/samples/java - Java implementation
/samples/python - Python implementation
generate_reference_embeddings.py - Script to generate reference embeddings for cross-language testing
run_tests.sh and run_tests.ps1 - Test scripts for Linux/macOS and Windows

Getting Started

Clone this repository:

git clone https://github.com/yuniko-software/bge-m3-onnx.git
cd bge-m3-onnx

Get the BGE-M3 ONNX models:
- Option 1: Download from releases (recommended)
  - Check the repository releases and download onnx.zip
  - It already contains the bge-m3 embedding model and its tokenizer
- Option 2: Generate yourself using the notebook
  - Open and run bge-m3-to-onnx.ipynb - this is the most important file in the repository
  - The notebook demonstrates how to convert BGE-M3 from FlagEmbedding to ONNX format
  - This will create bge_m3_tokenizer.onnx, bge_m3_model.onnx, and bge_m3_model.onnx_data in the /onnx folder
Note: This repository uses BAAI/bge-m3 as the embedding model with its XLM-RoBERTa tokenizer.
Generate reference embeddings (optional):
- Run python generate_reference_embeddings.py to create reference embeddings for testing
Run the samples:
- Once you have the ONNX models in the /onnx folder, you can run any sample
- Try the .NET sample in /samples/dotnet or the Java sample in /samples/java
Verify cross-language embeddings (optional):
- To ensure that .NET and Java embeddings match the Python-generated embeddings, you can run:
- On Linux/macOS:
```
chmod +x run_tests.sh
./run_tests.sh
```
- On Windows:
```
./run_tests.ps1
```
Note: These scripts require Python, .NET, Java, and Maven to be installed.

CUDA Support

This BGE-M3 ONNX model supports CUDA GPU acceleration for improved performance. To enable CUDA support:

Python

Install the ONNX Runtime with CUDA support:

Resource: ONNX Runtime CUDA Execution Provider Requirements

This model is compatible with:

pip install onnxruntime-gpu[cuda,cudnn] - packages that include CUDA and cuDNN DLLs
PyTorch packages that include CUDA and cuDNN DLLs

C# and Java

For C# and Java implementations, you need to install CUDA and cuDNN separately:

CUDA Installation:

Linux: CUDA Installation Guide for Linux
Windows: CUDA Installation Guide for Windows

cuDNN Installation:

cuDNN Backend Installation Guide

Python Example

from bge_m3_embedder import create_cpu_embedder, create_cuda_embedder

# Create CPU-optimized embedder
embedder = create_cpu_embedder("onnx/bge_m3_tokenizer.onnx", "onnx/bge_m3_model.onnx")

# Generate all three embedding types
result = embedder.encode("Hello world!")

print(f"Dense: {len(result['dense_vecs'])} dimensions")
print(f"Sparse: {len(result['lexical_weights'])} tokens")  
print(f"ColBERT: {len(result['colbert_vecs'])} vectors")

# Clean up resources
embedder.close()

# For CUDA acceleration
cuda_embedder = create_cuda_embedder("onnx/bge_m3_tokenizer.onnx", "onnx/bge_m3_model.onnx", device_id=0)
result = cuda_embedder.encode("Hello world!")
cuda_embedder.close()

# See full implementation in samples/python

C# Example

using BgeM3.Onnx;

// Create CPU-optimized embedder
using var embedder = M3EmbedderFactory.CreateCpuOptimized(tokenizerPath, modelPath);

// Generate all embedding types
var result = embedder.GenerateEmbeddings("Hello world!");

Console.WriteLine($"Dense: {result.DenseEmbedding.Length} dimensions");
Console.WriteLine($"Sparse: {result.SparseWeights.Count} tokens");
Console.WriteLine($"ColBERT: {result.ColBertVectors.Length} vectors");

// For CUDA acceleration
using var cudaEmbedder = M3EmbedderFactory.CreateCudaOptimized(tokenizerPath, modelPath, deviceId: 0);
var cudaResult = cudaEmbedder.GenerateEmbeddings("Hello world!");

// See full implementation in samples/dotnet

Java Example

import com.yunikosoftware.bgem3onnx.*;

// Create CPU-optimized embedder
try (M3Embedder embedder = M3EmbedderFactory.createCpuOptimized(tokenizerPath, modelPath)) {
    // Generate all embedding types
    M3EmbeddingOutput result = embedder.generateEmbeddings("Hello world!");
    
    System.out.println("Dense: " + result.getDenseEmbedding().length + " dimensions");
    System.out.println("Sparse: " + result.getSparseWeights().size() + " tokens");
    System.out.println("ColBERT: " + result.getColBertVectors().length + " vectors");
}

// For CUDA acceleration
try (M3Embedder cudaEmbedder = M3EmbedderFactory.createCudaOptimized(tokenizerPath, modelPath, 0)) {
    M3EmbeddingOutput result = cudaEmbedder.generateEmbeddings("Hello world!");
    // Process CUDA results
}

// See full implementation in samples/java

⭐ If you find this project useful, please consider giving it a star on GitHub! ⭐

Your support helps make this project more visible to other developers who might benefit from BGE-M3's complete multi-vector functionality.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github		.github
samples		samples
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bge-m3-to-onnx.ipynb		bge-m3-to-onnx.ipynb
performance_comparison.ipynb		performance_comparison.ipynb
run_performance.ps1		run_performance.ps1
run_performance.sh		run_performance.sh
run_tests.ps1		run_tests.ps1
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BGE-M3 ONNX

Key Features

Repository Structure

Getting Started

CUDA Support

Python

C# and Java

Python Example

C# Example

Java Example

About

Uh oh!

Releases 2

Packages

Contributors 3

Uh oh!

Languages

License

yuniko-software/bge-m3-onnx

Folders and files

Latest commit

History

Repository files navigation

BGE-M3 ONNX

Key Features

Repository Structure

Getting Started

CUDA Support

Python

C# and Java

Python Example

C# Example

Java Example

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Uh oh!

Languages

Packages