Skip to content

ScribeAR Multi Tenency HLD

Bennett Wu edited this page Sep 18, 2025 · 1 revision

ScribeAR Multi Tenency - High Level Design

Features

Below is a list of features that guide the design proposed in this document. The proposed deisgn seeks to implement the in-scope features while providing flexibility for supporting future enhancements.

In-Scope Features

  • Core Functionality

    • Provide accurate and low-latency live transcription streams from one or more microphone streams
  • Multi Tenency

    • Transcription streams divided into independent sessions
    • Transcriptions from one session cannot be viewed by a different session
    • Support multiple concurrent sessions
    • Sessions can be started by a schedule
  • Authentication

    • Authentication required to view a transcription session
    • Users can scan a rotating QR code or join code to authenticate to a session
  • User Experience

    • Users can view live transcriptions on a kiosk device
    • Users can view live transcriptions on their personal devices
    • Users can to join sessions via a publically accessible landing page
    • System should require no or minimal user interaction to begin/end sessions
    • System should provide actionable error prompts when things go wrong
  • Scalability & Performance

    • Transcription services should scale for many concurrent sessions
    • Services should make efficient use of GPU resources
  • Reliabilty

    • System should be tolerant of temporary network interruptions or service failures and automatically resume transcription stream when possible
  • Monitoring & Analytics

    • System should log health and performance metrics for troubleshooting and debugging

Out-Of Scope (Future Enhancements)

  • Transcriptions

    • Speaker diarization support
    • Multi-language support
    • Custom vocabulary support
    • Accurate transcription timestamps
  • Transcription History

    • Transcripts should be saved and made viewable and downloadable for authorized users
  • Authentication

    • Users can authenticate using an external identity provider (e.g. NetID)
  • Authorization

    • Fine grained access controls for features (downloads, session viewing, session admin, session scheduling, etc.)
  • Admin Dashboard

    • Web dashboard for authorized users to schedule and manage sessions
    • Admins can modify session authentication requirements
    • Admins can kick actively connected users of a session
    • Admins can edit session transcription configuration (diarization, language, vocabulary, etc.)
  • Scalability & Performance

    • Support for automated horizontal scaling and load balancing

High Level Design

alt text

Key Components

  • Audio Source

    • A software client responsible for capturing one or more audio sources and sending it to a server to be processed.
    • Waits for a Transcription Session to start before capturing and sending audio.
  • Transcription Sink

    • A software client responsible for receiving transcription events from a server for display to user.
  • Kiosk

    • A network connected client with display capabilities.
    • Acts as a transcription sink and optionally as an audio source.
    • Can also be used to display authentication codes (QR code or join code).
  • Headless Audio Node

    • A network connected client that acts an audio source.
  • Session Management API

    • API for manging the scheduling of sessions.
    • Audio sources should connect to the session management API to receive session start events.
  • Session Auth API

    • API for managing authentication of sessions.
    • Converts a user identity into a session identity so that the transcription session API does not need to manage multiple identities.
    • Audio Sources and Transcription Sinks should first use session auth API before connecting to Transcription Session API.
    • Kiosk should also contact Session Auth API if it needs to display QR code or join code.
  • Transcription Session API

    • API for receiving audio and broadcasting transcription events.
  • Session Scheduler

    • Automatically starts and stops transcription sessions.
  • Source Token

    • An API key provided to Headless Audio Nodes and Kiosks that uniquely identify them.
  • Session Token

    • An API key generated by Session Auth API and provided to Audio Sources and Transcription Sinks to uniquely identify them.
    • Sent to Transcription Session API to be used to

Audio Source Lifecycle

  1. Connect via Websocket to Session Management API.
  2. Authenticate using Source Token.
  3. Wait for transcription session start event.
  4. On transcription session start event:
    1. Use Source Token to obtain Session Token from Session Auth API.
    2. Connect via Websocket to Transcription Session API.
    3. Start sending audio.
Clone this wiki locally