-
Notifications
You must be signed in to change notification settings - Fork 0
ScribeAR Multi Tenency HLD
Bennett Wu edited this page Sep 18, 2025
·
1 revision
Below is a list of features that guide the design proposed in this document. The proposed deisgn seeks to implement the in-scope features while providing flexibility for supporting future enhancements.
In-Scope Features
-
Core Functionality
- Provide accurate and low-latency live transcription streams from one or more microphone streams
-
Multi Tenency
- Transcription streams divided into independent sessions
- Transcriptions from one session cannot be viewed by a different session
- Support multiple concurrent sessions
- Sessions can be started by a schedule
-
Authentication
- Authentication required to view a transcription session
- Users can scan a rotating QR code or join code to authenticate to a session
-
User Experience
- Users can view live transcriptions on a kiosk device
- Users can view live transcriptions on their personal devices
- Users can to join sessions via a publically accessible landing page
- System should require no or minimal user interaction to begin/end sessions
- System should provide actionable error prompts when things go wrong
-
Scalability & Performance
- Transcription services should scale for many concurrent sessions
- Services should make efficient use of GPU resources
-
Reliabilty
- System should be tolerant of temporary network interruptions or service failures and automatically resume transcription stream when possible
-
Monitoring & Analytics
- System should log health and performance metrics for troubleshooting and debugging
Out-Of Scope (Future Enhancements)
-
Transcriptions
- Speaker diarization support
- Multi-language support
- Custom vocabulary support
- Accurate transcription timestamps
-
Transcription History
- Transcripts should be saved and made viewable and downloadable for authorized users
-
Authentication
- Users can authenticate using an external identity provider (e.g. NetID)
-
Authorization
- Fine grained access controls for features (downloads, session viewing, session admin, session scheduling, etc.)
-
Admin Dashboard
- Web dashboard for authorized users to schedule and manage sessions
- Admins can modify session authentication requirements
- Admins can kick actively connected users of a session
- Admins can edit session transcription configuration (diarization, language, vocabulary, etc.)
-
Scalability & Performance
- Support for automated horizontal scaling and load balancing
Key Components
-
Audio Source
- A software client responsible for capturing one or more audio sources and sending it to a server to be processed.
- Waits for a Transcription Session to start before capturing and sending audio.
-
Transcription Sink
- A software client responsible for receiving transcription events from a server for display to user.
-
Kiosk
- A network connected client with display capabilities.
- Acts as a transcription sink and optionally as an audio source.
- Can also be used to display authentication codes (QR code or join code).
-
Headless Audio Node
- A network connected client that acts an audio source.
-
Session Management API
- API for manging the scheduling of sessions.
- Audio sources should connect to the session management API to receive session start events.
-
Session Auth API
- API for managing authentication of sessions.
- Converts a user identity into a session identity so that the transcription session API does not need to manage multiple identities.
- Audio Sources and Transcription Sinks should first use session auth API before connecting to Transcription Session API.
- Kiosk should also contact Session Auth API if it needs to display QR code or join code.
-
Transcription Session API
- API for receiving audio and broadcasting transcription events.
-
Session Scheduler
- Automatically starts and stops transcription sessions.
-
Source Token
- An API key provided to Headless Audio Nodes and Kiosks that uniquely identify them.
-
Session Token
- An API key generated by Session Auth API and provided to Audio Sources and Transcription Sinks to uniquely identify them.
- Sent to Transcription Session API to be used to
- Connect via Websocket to Session Management API.
- Authenticate using Source Token.
- Wait for transcription session start event.
- On transcription session start event:
- Use Source Token to obtain Session Token from Session Auth API.
- Connect via Websocket to Transcription Session API.
- Start sending audio.