Advanced multimodal video annotation analysis tool for both reviewing VideoAnnotator pipeline outputs and creating new annotation jobs. Features synchronized pose detection, speech recognition, speaker diarization, and scene detection visualization with integrated job management.
Companion processing pipeline: VideoAnnotator
Video Annotation Viewer is a sophisticated web-based application designed for researchers and analysts working with multimodal video data. It provides an integrated interface for reviewing the outputs of automated video analysis pipelines, particularly those generated by the VideoAnnotator system.
Video Annotation Viewer interface showing synchronized multimodal annotations including pose detection, facial emotion recognition, speech recognition, speaker diarization, and scene detection overlays.
- Native Support: Direct compatibility with VideoAnnotator pipeline outputs
- Standard Formats: COCO keypoints, WebVTT subtitles, RTTM speaker data, scene detection JSON
- Multi-file Loading: Drag-and-drop interface for video + annotation files
- Automatic Detection: Intelligent file type recognition and validation
- Create Annotation Jobs: Submit videos for processing through VideoAnnotator API
- Pipeline Selection: Choose from scene detection, person tracking, face analysis, and audio processing
- Batch Processing: Handle multiple videos simultaneously
- Real-time Monitoring: Track job progress with live status updates
- Job Management: View, monitor, and manage all annotation jobs in one interface
- Pose Detection: COCO-format human pose keypoints with 17-point skeleton rendering
- Speech Recognition: WebVTT subtitle display with precise timing
- Speaker Diarization: RTTM-based speaker identification and timeline visualization
- Scene Detection: Scene boundary markers and transition analysis
- Track Persistence: Person tracking with consistent identity across frames
- Time-based Navigation: Click-to-seek with millisecond precision
- Multi-track Display: Speech, speaker, scene, and motion tracks
- Synchronized Playback: All annotations stay perfectly aligned with video
- Hover Details: Rich information tooltips for timeline events
- Unified Interface: Elegant two-column layout with video player and integrated control panel
- Color-Coded Controls: Intuitive colored circle buttons for each annotation component
- Smart Toggles: Individual overlay management with synchronized timeline controls
- Lock Functionality: Padlock feature for coordinated control modes
- JSON Viewers: Individual data inspection buttons for each pipeline component
- Debug Panel: Professional debugging interface (Ctrl+Shift+D) with automated testing
- Navigation: Easy return to home and access to VideoAnnotator documentation
- Demo Datasets: Multiple built-in sample datasets including VEATIC silent video
- MP4 (H.264/H.265)
- WebM
- AVI
- MOV
- Person Tracking: COCO JSON format with keypoints and bounding boxes
- Speech Recognition: WebVTT (.vtt) files with timestamped transcriptions
- Speaker Diarization: RTTM (.rttm) files in NIST format
- Scene Detection: JSON arrays with scene boundaries and classifications
- Audio: Separate WAV files for audio analysis
- Open the application
- Click "View Demo" on the welcome screen
- Explore the sample VideoAnnotator dataset with full multimodal annotations
- Click "Get Started" from the welcome screen
- Drag and drop your files:
- One video file (e.g.,
video.mp4
) - Multiple annotation files (e.g.,
person_tracking.json
,speech.vtt
,speakers.rttm
,scenes.json
)
- One video file (e.g.,
- The system automatically detects and validates file formats
- Click "Start Viewing" to begin analysis
- Click "Create Annotations" from the main interface
- Navigate to "New Job" in the job management panel
- Upload your video files (supports batch processing)
- Select annotation pipelines (scene detection, person tracking, face analysis, audio processing)
- Configure pipeline parameters or use defaults
- Submit jobs and monitor progress in real-time
- View completed results directly in the annotation viewer
{
"id": 1,
"image_id": "frame_0001",
"category_id": 1,
"keypoints": [x1, y1, v1, x2, y2, v2, ...], // 17 keypoints Γ 3 values
"bbox": [x, y, width, height],
"track_id": 1,
"timestamp": 1.25,
"score": 0.95
}
WEBVTT
00:00:01.000 --> 00:00:03.500
Hello, how are you doing today?
00:00:04.000 --> 00:00:06.200
I'm doing great, thanks for asking.
SPEAKER filename 1 1.25 2.30 <NA> <NA> SPEAKER_00 <NA> <NA>
SPEAKER filename 1 3.80 1.50 <NA> <NA> SPEAKER_01 <NA> <NA>
[
{
"id": 1,
"start_time": 0.0,
"end_time": 5.2,
"scene_type": "conversation",
"score": 0.89
}
]
- Node.js 18+ or Bun runtime
- Modern web browser with ES2020 support
# Clone the repository
git clone https://github.com/InfantLab/video-annotation-viewer.git
cd video-annotation-viewer
# Install dependencies
bun install
# or npm install
# Start development server
bun run dev
# or npm run dev
# Build for production
bun run build
# or npm run build
src/
βββ components/ # React components
β βββ VideoPlayer.tsx # Main video player with overlays
β βββ Timeline.tsx # Interactive timeline component
β βββ FileUploader.tsx # Multi-file upload interface
β βββ ...
βββ lib/parsers/ # Format-specific parsers
β βββ coco.ts # COCO format parser
β βββ webvtt.ts # WebVTT parser
β βββ rttm.ts # RTTM parser
β βββ merger.ts # Data integration utility
βββ types/ # TypeScript type definitions
β βββ annotations.ts # Standard format interfaces
βββ utils/ # Utility functions
βββ debugUtils.ts # Demo data loading
βββ version.ts # Version management
- Behavioral Analysis: Review automated behavior detection results
- Algorithm Validation: Verify computer vision pipeline accuracy
- Multimodal Studies: Analyze speech, movement, and visual data together
- Dataset Annotation: Quality control for training data
- Therapy Assessment: Analyze patient-therapist interactions
- Developmental Studies: Track child development indicators
- Social Interaction: Study group dynamics and communication patterns
- Movement Analysis: Assess motor skills and physical therapy progress
Important: This project is designed to be used in conjunction with VideoAnnotator. Here's how they work together:
-
VideoAnnotator processes your videos to generate annotation data:
- Analyzes video files using advanced computer vision and ML pipelines
- Outputs standardized annotation files (COCO, WebVTT, RTTM, JSON)
- Handles the computationally intensive analysis work
-
Video Annotation Viewer (this project) visualizes and reviews those results:
- Loads VideoAnnotator output files alongside original videos
- Provides interactive visualization and playback controls
- Enables detailed review and quality assessment
Your Video Files β [VideoAnnotator Processing] β Annotation Files β [This Viewer] β Interactive Analysis
Video Annotation Viewer integrates seamlessly with:
- VideoAnnotator: Primary annotation processing pipeline
- Research Workflows: Export-ready data formats for further analysis
- Analysis Tools: Standard format compatibility for statistical processing
For detailed release notes and changes, see CHANGELOG.md.
-
v0.3.0: VideoAnnotator Job Creation & Management (August 2025)
- Job Creation Wizard: Create new annotation jobs through VideoAnnotator API
- Pipeline Management: Select and configure scene detection, person tracking, face analysis, and audio processing
- Batch Processing: Submit multiple videos simultaneously
- Real-time Monitoring: Live job status updates and progress tracking
- Professional Interface: Enhanced UI with consistent branding and improved user experience
- API Integration: Full VideoAnnotator server integration with authentication and error handling
-
v0.2.0: Enhanced interface and improved functionality (August 2025)
- Updated project branding to "Video Annotation Viewer"
- Consistent GitHub repository naming (
video-annotation-viewer
) - New interface screenshot and social media integration
- Improved documentation and developer experience
-
v0.1.0: Initial release with full VideoAnnotator integration
- COCO, WebVTT, RTTM, and Scene detection support
- Multi-file upload and automatic format detection
- Interactive timeline with synchronized playback
- Demo dataset integration
Comprehensive documentation is available in the docs/
directory:
-
Interactive Documentation - Explore the codebase with deepwiki's AI-powered documentation
-
Developer Guide - Technical architecture and development setup
-
Agents Guide - Instructions for AI coding agents working in this repo
-
File Formats - VideoAnnotator format specifications
-
Debug Utils - Console debugging and testing tools
-
Client-Server Guide - VideoAnnotator API integration (New in v0.3.0)
-
QA Testing v0.3.0 - Current quality assurance procedures
-
Implementation History - Development tracking and historical records
-
v0.4.0 Roadmap - Future feature planning
This project is part of the InfantLab research ecosystem. For contributions, issues, or feature requests:
- Check the GitHub repository
- Review existing issues and feature requests
- Follow the project's coding standards and testing requirements
See the LICENSE file for details.
Developed by: Caspar Addyman [email protected]
If you use this software in your research, please cite:
Addyman, C. (2025). Video Annotation Viewer: Interactive visualization tool for multimodal video annotation data.
DOI: 10.5281/zenodo.16948764
For Questions: Please contact the developers at [email protected]
For Bug Reports: Please raise a GitHub issue at:
https://github.com/InfantLab/video-annotation-viewer/issues
For VideoAnnotator Questions: Visit the main VideoAnnotator repository:
https://github.com/InfantLab/VideoAnnotator
This work was supported by the Global Parenting Initiative (Funded by The LEGO Foundation).
ποΈ Built with modern web technologies | β‘ Powered by Bun runtime | π¬ Made for researchers