lakeFS - Data version control for your data lake | Git for data
-
Updated
Sep 12, 2025 - Go
lakeFS - Data version control for your data lake | Git for data
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Data Engineering Project with Hadoop HDFS and Kafka
Kafka Connect FileSystem Connector
基于Hadoop的分布式云存储系统 🌴
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Hadoop utility to compact small files
SFTP server which works on the top of HDFS,It is based on Apache sshd to access and operate HDFS through SFTP protocol
Toy Hadoop cluster combining various SQL-on-Hadoop variants
An sbt plugin for publishing artifacts to HDFS.
Python wrapper to access Hadoop HDFS REST API
Data pipeline to process and analyse Twitter data in a distributed fashion using Apache Spark and Airflow in AWS environment
MapReduce Java Code Examples to learn Hadoop
Neat and Handy Place for all Hadoop codes
Mahout's XMLInputFormat with support for multiple input and output tags.
Ingestion pipeline to analyze soccer tweets
Add a description, image, and links to the hadoop-filesystem topic page so that developers can more easily learn about it.
To associate your repository with the hadoop-filesystem topic, visit your repo's landing page and select "manage topics."