Skip to content

Enterprise-Grade Secure ASR Diarization Pipeline - HIPAA-compliant speech processing service combining automatic speech recognition with speaker diarization. Features modular architecture, comprehensive security, and production-ready deployment.

License

Notifications You must be signed in to change notification settings

SunPCSolutions/DiarASR

Repository files navigation

DiarASR

⚠️ IMPORTANT NOTICE: HuggingFace models are downloaded on first API request. Due to potential network issues or HF download instability, the first few attempts might fail. After successful model download and caching, subsequent requests will work reliably.

Enterprise-grade secure ASR diarization pipeline combining automatic speech recognition with speaker diarization. HIPAA-compliant with modular architecture and comprehensive security.

Features

  • 🔒 Enterprise Security: API key authentication, input validation, rate limiting
  • 🎯 High-Quality Processing: DER ~8-20%, WER ~1-5% with robust speaker attribution
  • 🩺 HIPAA Compliance: Secure file handling, audit logging, encrypted storage
  • 🏗️ Modular Architecture: Clean separation into focused modules
  • 🐳 Production Ready: Container-ready with security enhancements

Requirements

  • GPU: NVIDIA GPU with CUDA 13.0+ (8GB+ VRAM recommended)
  • OS: Linux (Ubuntu 24.04+, CentOS 8+)
  • Python: 3.12+
  • Models: Access required for nvidia/parakeet-tdt-1.1b and pyannote/speaker-diarization-community-1

Installation

Python Installation

git clone https://github.com/SunPCSolutions/DiarASR.git
cd DiarASR
python3 -m venv .venv
source .venv/bin/activate
pip install -r app/requirements.txt
cp .env.example .env
# Edit .env with your API keys and HuggingFace token

Docker Installation

git clone https://github.com/SunPCSolutions/DiarASR.git
cd DiarASR
cp .env.example .env
# Edit .env with your API keys and HuggingFace token
docker-compose up --build -d

Usage

Python API

import os
from app.app import process_audio

os.environ['HF_TOKEN'] = 'your-huggingface-token'
os.environ['API_KEYS'] = 'your-api-key'

result = process_audio(
    audio_path='audio.wav',
    diarize=True,
    min_speakers=2,
    max_speakers=4
)

for segment in result['segments']:
    print(f"{segment['speaker']}: {segment['text']}")

REST API

# Start server
export API_KEYS="your-api-key"
export HF_TOKEN="hf_xxx"
uvicorn app:app --host 0.0.0.0 --port 8003

# Make request
curl -H "X-API-Key: your-api-key" \
  -X POST "http://localhost:8003/transcribe_diarize/" \
  -F "[email protected]"

See docs/API_PARAMETERS.md for complete API documentation.

Model Access

Required HuggingFace Access:

Set HF_TOKEN environment variable with your HuggingFace token.

Performance

  • ASR Accuracy: ~1-5% WER (Parakeet TDT-1.1B)
  • Diarization Quality: ~8-20% DER (Pyannote Community-1)
  • Processing Speed: ~12x real-time with GPU
  • Memory Usage: <8GB VRAM

Security

  • API key authentication
  • Multi-layer input validation
  • Rate limiting (10 req/min)
  • Encrypted temporary storage
  • HIPAA-compliant processing
  • Comprehensive audit logging

Documentation

License

MIT License - see LICENSE file for details.

Acknowledgments

Our greatest appreciation to the creators of:

  • Pyannote.audio (Hervé Bredin et al.) for speaker diarization
  • NVIDIA Parakeet TDT (NVIDIA NeMo team) for ASR
  • FastAPI (Sebastián Ramírez) for the web framework
  • PyTorch (Facebook AI Research) for deep learning

Please cite these works if used in your research.

About

Enterprise-Grade Secure ASR Diarization Pipeline - HIPAA-compliant speech processing service combining automatic speech recognition with speaker diarization. Features modular architecture, comprehensive security, and production-ready deployment.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published