Skip to content

moonshine-ai/moonshine

Repository files navigation

Moonshine

[Blog] [Paper 1] [Paper 2] [Model Card] [Podcast]

Moonshine is a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. It is well-suited to real-time, on-device applications like live transcription and voice command recognition. English Moonshine obtains word-error rates (WER) better than similarly-sized Tiny and Base Whisper on the OpenASR leaderboard, and non-English Moonshine variants outperform Whisper Small and Medium, which are 9x and 28x larger, respectively.

Moonshine processes audio segments between 5x-15x faster than Whisper while maintaining the same (or significantly better!) WER/CER. This is because its compute requirements scale with the length of input audio. Shorter input audio is processed faster, unlike Whisper models that process everything as 30-second chunks.

Unquantized Base is 62M parameters (or 400MB), while Tiny is 27M parameters (around 190MB).

Supported Languages

Moonshine currently supports 8 languages. Below is a performance summary. Arabic, Chinese, Japanese, and Korean are character-error rates (CER); all others are WER.

Language Tag Moonshine Tiny (27M) Moonshine Base (62M) Whisper Tiny (39M) Whisper Base (74M) Whisper Small (244M) Whisper Medium (769M)
Arabic ar 24.76 52.40 48.25 32.44 25.44
English 12.66 10.07 12.81 10.32
Chinese zh 32.77 68.51 59.13 46.76 40.41
Japanese ja 15.69 96.71 72.69 40.94 27.88
Korean ko 9.85 23.92 15.93 9.87 7.68
Spanish es TBA
Ukrainian uk 19.70 66.77 48.56 25.93 16.51
Vietnamese vi 15.92 96.4 52.79 26.46 18.49
error_delta_bar

Read the paper for more details on our non-English flavors of Moonshine.

Supported Backends

With the release of new Moonshine languages, we have deprecated the Keras-based moonshine package. We recommend using Hugging Face transformers for vibe-checking the models, and using the ONNX runtime via moonshine-onnx for on-device applications. This table summarizes support:

Model Language transformers ONNX Keras (deprecated)
tiny-ar Arabic
tiny-zh Chinese
tiny English
base English
tiny-ja Japanese
tiny-ko Korean
base-es Spanish
tiny-uk Ukrainian
tiny-vi Vietnamese

Table of Contents

Installation

We like uv for managing Python environments, so we use it here. If you don't want to use it, simply skip the uv installation and leave uv off of your shell commands.

1. Create a virtual environment

First, install uv for Python environment management.

Then create and activate a virtual environment:

uv venv env_moonshine
source env_moonshine/bin/activate

2. Install useful-moonshine-onnx

Using Moonshine with the ONNX runtime is preferable if you want to run the models on SBCs like the Raspberry Pi. To use it, run the following:

uv pip install useful-moonshine-onnx@git+https://[email protected]/moonshine-ai/moonshine.git#subdirectory=moonshine-onnx

3. Try it out

You can test Moonshine by transcribing the provided example audio file with the .transcribe function:

python
>>> import moonshine_onnx
>>> moonshine_onnx.transcribe(moonshine_onnx.ASSETS_DIR / 'beckett.wav', 'moonshine/tiny')
['Ever tried ever failed, no matter try again, fail again, fail better.']

The first argument is a path to an audio file and the second is the name of a Moonshine model. moonshine/tiny and moonshine/base are English-only models. If you wish to use one of the non-English Moonshine models, just append the language IETF tag to the model name, e.g., moonshine/tiny-ko. See the table for supported languages and their tags.

Examples

Moonshine models can be used in many applications, so we've included code samples showing how to use them in different situations. The demo folder in this repository also has more information on them.

Hugging Face Transformers

Moonshine is supported by the transformers library, as follows:

import torch
from transformers import AutoProcessor, MoonshineForConditionalGeneration
from datasets import load_dataset

processor = AutoProcessor.from_pretrained("UsefulSensors/moonshine-tiny")
model = MoonshineForConditionalGeneration.from_pretrained("UsefulSensors/moonshine-tiny")

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
audio_array = ds[0]["audio"]["array"]

inputs = processor(audio_array, return_tensors="pt")

generated_ids = model.generate(**inputs)

transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(transcription)

If you wish to use one of the non-English Moonshine models, just append the IETF code to the repo ID, e.g., UsefulSensors/moonshine-tiny-ko. See the table for supported languages and their tags.

Live Captions

You can try the Moonshine ONNX models with live input from a microphone with the live captions demo.

CTranslate2

The files for the CTranslate2 versions of Moonshine are available at huggingface.co/UsefulSensors/moonshine/tree/main/ctranslate2, but they require a pull request to be merged before they can be used with the mainline version of the framework. Until then, you should be able to try them with our branch, with this example script.

Web Applications

Use our MoonshineJS library to run Moonshine models in the web browser with a few lines of Javascript.

License

All inference code in this repo is released under the MIT license. The English Moonshine models are also released under the MIT license.

All non-English Moonshine variants are released under the Moonshine AI Community License (TLDR: Models are free to use for researchers, developers, small businesses, and creators with less than $1M in annual revenue.).

A copy of both licenses is included in this repository.

Citation

If you benefit from our work, please cite our paper:

@misc{jeffries2024moonshinespeechrecognitionlive,
      title={Moonshine: Speech Recognition for Live Transcription and Voice Commands}, 
      author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
      year={2024},
      eprint={2410.15608},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2410.15608}, 
}

Please also cite our paper on non-English Moonshine variants if you find them useful:

@misc{king2025flavorsmoonshinetinyspecialized,
      title={Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices}, 
      author={Evan King and Adam Sabra and Manjunath Kudlur and James Wang and Pete Warden},
      year={2025},
      eprint={2509.02523},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.02523}, 
}

About

Fast and accurate automatic speech recognition (ASR) for edge devices

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages