A reproducible stack that incrementally ingests Notion databases into a dedicated Postgres warehouse, with full run traceability via Dagster.
- dlt pipelines: Handles incremental sync from Notion API
- Postgres: Data warehouse container
- Dagster OSS: Orchestration and monitoring
- Docker Compose: Service orchestration
- macOS 14+ with Orbstack ≥ 0.18 (Docker & Compose drop-in)
- 4 GB free RAM
- Open ports: 5432, 3000
- Go to https://www.notion.so/my-integrations
- Click "+ New integration"
- Give it a name like "Data Sync"
- Select the workspace you want to sync from
- Click "Submit"
- Copy the "Internal Integration Token" (starts with
secret_
)
- Open the Notion databases you want to sync in your browser
- Copy the database ID from the URL. For example:
https://www.notion.so/workspace/32-character-string?v=... ↑ This is your database ID
- If you have multiple databases, collect all their IDs
- In each Notion database, click the "•••" menu (top right)
- Click "Connections" → "Connect to"
- Search for your integration name and select it
- Click "Confirm"
📋 You'll need:
NOTION_TOKEN
: Your integration token (secret_xxxx...)NOTION_DB_ID
: Comma-separated database IDs (32-char-string1,32-char-string2)
# Copy environment template
cp env.example .env
# Edit with your credentials
nano .env
Fill in:
NOTION_TOKEN
: Your integration tokenNOTION_DB_ID
: Comma-separated database IDs- Postgres credentials (or use defaults)
# Build the code location service
docker compose build code
# Start all services in background
docker compose up -d
# Check status
docker compose ps
Open http://localhost:3000 in your browser.
- Go to "Deployment" → "Code locations"
- Add new location:
code_location
(Docker) - Navigate to "Assets" to see
notion_sync
- Click "Materialize" for manual run or enable the schedule
notion2pg/
├─ code_location/
│ ├─ Dockerfile
│ ├─ requirements.txt
│ └─ defs.py # Dagster assets + dlt pipeline
├─ docker-compose.yml
├─ env.example
├─ pipeline_state/ # Persistent .dlt state
└─ README.md
In Dagster UI, go to Assets → notion_sync
→ "Materialize"
The pipeline runs daily at 04:30 Europe/Madrid timezone. Enable in Dagster UI under "Schedules".
View real-time logs: docker compose logs -f code
# Database backup
docker compose exec postgres_dwh pg_dump -U postgres analytics > backup.sql
docker compose ps
docker compose logs dagster_web
# Clear pipeline state for fresh sync
rm -rf pipeline_state/*
docker compose restart code
If ports 3000 or 5432 are in use, modify docker-compose.yml
port mappings.
- Add unit tests with pytest + dagster dev
- Configure failure notifications
- Expand to other APIs
- Add dbt models downstream