A distributed data warehousing architecture leveraging Content-Centric Networking (CCNx) for large-scale data dissemination and analytics.
This project implements a reconciled data warehouse layer using CCNx protocol, designed to handle very large database warehousing with distributed data locality requirements. It addresses the challenge of disseminating terabyte-scale daily data loads to geographically dispersed users.
- CCNx-based Architecture: Utilizes Content-Centric Networking primitives for efficient data distribution
- MariaDB/MySQL Integration: Custom storage engine implementation based on CSV engine
- Distributed Operations: Supports geographically dispersed data warehousing with explicit concurrency control
- Large-Scale Data Handling: Optimized for Earth Observation and scientific datasets (100s MB/day ETL rates)
Presented at CCNxCon 2013, PARC
- Modified MariaDB/MySQL storage engine with CCNx support
- Apache Zookeeper for global state management
- Custom SQL CCDB commands
- Hierarchical namespace mapping to CCNx URI schema
- Earth Observation data dissemination
- Climate model data distribution (CMIP5)
- Large-scale analytics workloads requiring data locality
M. Alexander, Vienna University of Technology
