Collection of Snowflake Notebook demos, tutorials, and examples
-
Updated
Sep 3, 2025 - Jupyter Notebook
Collection of Snowflake Notebook demos, tutorials, and examples
Interactive computing for complex data processing, modeling and analysis in Python 3
Companion notebooks for blogs/tutorials on ML4Devs website.
Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
Transform messy data science notebooks into production-ready code. Examples covering testing, CI/CD, MLOps, and scalable deployment practices.
A curated collection of example marimo notebooks — use these as templates for your own experiments, workflows, and tools.
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformations and Actions, Spark DataFrame, Spark SQL, and more. It is completely free on YouTube and is beginner-friendly without any prerequisites.
Data pipelines and notebooks for RAG tuning using Fondant
Tools to streamline Jupyter Notebook Prototypes into robust Data Products
A starter repository for your next AWS Glue project. This comes with complete IaC, a CD pipeline and a reusable common SDK. Set up jupyter notebook for AWS Glue locally
ETL with Jupyter Notebooks, Pandas, and Azure Cosmos DB
This repository provides containerized applications and microservices for the Information Systems and Databases Course @ Instituto Superior Técnico
dtflw is a Python framework for building modular data pipelines based on Databricks dbutils.notebook API.
Jupyter Notebook Databases Stack
Jupyter Notebook Intelligent Retrieval Systems Stack
This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.
Delta Lake Optimization Project: Hands‑on lab to explore partitioning, Z‑Ordering, compaction (manual & auto), Liquid Clustering, and VACUUM using a synthetic sales dataset in Databricks. Includes a step‑by‑step notebook to measure file scans, bytes read, and query performance for each optimization.
Hugo site for The Orchestrator's Notebook blog
Common ETL patterns and utilities for PySpark. Notebooks tested on Databricks Community edition
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."