OpenWebTTS is a local web-based application that provides a simple interface for generating speech using multiple Text-to-Speech (TTS) or Speech-to-Text (STT) engines.

- Simple Web Interface: A clean UI for text input and audio generation.
- Multiple Engine Support: Use Piper, Kokoro or Coqui for TTS or OpenAI Whisper for STT.
- Real-time Generation: Generates as you listen for smooth playback or recording.
OpenWebTTS/
|
├── app.py # Main FastAPI application
├── requirements.txt # Python dependencies
├── README.md # This file
├── .gitignore # Git ignore file
├── flatpak-manifest.yml # Flatpak manifest for Linux
|
├── functions/
| ├── users.py # User management and authentication
│ ├── gemini.py # Gemini API
│ ├── piper.py # Piper TTS
│ ├── whisper.py # OpenAI's Whisper STT
| ├── kitten.py # Kitten TTS
│ └── kokoro.py # Kokoro functions
|
├── models/ # Place your TTS models here
| ├── coqui/
| ├── piper/
| └── kokoro/
|
├── translations/ # Translations
| └── en/ (etc...)
|
├── users/
│ └── *.json # User settings and preferences
|
├── static/
| ├── css/ # Stylesheets
| ├── js/ # JavaScript files
| ├── audio/ # Static audio
| └── audio_cache/ # Generated audio cache
|
└── templates/
├── config.html # Configuration page
└── index.html # Main HTML page
- Python 3.11 (Recommended). Note: Other Python versions might not be fully compatiible due to dependencies. If you wish to use 3.12 or above, make sure to adjust the
requirements.txt
file accordingly, and note that not all functions will work. pip
andvenv
for managing dependencies.espeak-ng
for Kokoro.
It is highly recommended to use a virtual environment to avoid conflicts with system-wide packages. Note: Make sure you're using the correct python version (3.11 recommended) to create the venv.
# Navigate to the project directory
cd /path/to/OpenWebTTS
# Create a virtual environment
python3.11 -m venv venv
# Activate the virtual environment
# On macOS and Linux:
source venv/bin/activate
# On Windows:
.\venv\Scripts\activate
Install all the required Python libraries using the requirements.txt
file. Note: If you have an older graphics card PyTorch might need to be installed differently. Check the PyTorch docs.
pip install -r requirements.txt
pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl
- Use the integrated model downloader (recommended)
Or
- Download a Piper voice model from the official repository.
- Place the files inside
models/piper/
. For example:models/piper/en_US-lessac-medium.onnx
andmodels/piper/en_US-lessac-medium.onnx.json
.
- Use the integrated model downloader (recommended)
Or
- Download a model from the officla repository.
- Place the file inside
models/kokoro/
. For example:models/kokoro/af_heart.pt
- Find a model from the Coqui TTS releases or train your own.
- A Coqui model is typically a directory containing files like
model.pth
,config.json
, andspeakers.json
(for multi-speaker models). - Place the entire model directory inside
models/coqui/
. For example:models/coqui/your-coqui-model/
.
Once you have installed the dependencies and placed your models, you can start the web server.
python app.py
The application will be available at http://127.0.0.1:8000
. If you want the app to be available to your LAN, pass the --host=0.0.0.0
flag.
You can also run OpenWebTTS as a desktop app using a lightweight webview window. This requires a webview backend such as webkit2gtk
. This is still experimental, so if you experience any issues try running it as a web app instead.
python app.py --desktop