Installation | Quickstart | Reference docs | Change logs
Grain is a Python library for reading and processing data for training and evaluating JAX models. It is flexible, fast and deterministic.
Grain allows to define data processing steps in a simple declarative way:
import grain
dataset = (
grain.MapDataset.source([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
.shuffle(seed=42) # Shuffles elements globally.
.map(lambda x: x+1) # Maps each element.
.batch(batch_size=2) # Batches consecutive elements.
)
for batch in dataset:
# Training step.
Grain is designed to work with JAX models but it does not require JAX to run and can be used with other frameworks as well.
Grain is available on PyPI and can be
installed with pip install grain
.
Grain does not directly use GPU or TPU in its transformations, the processing within Grain will be done on the CPU by default.
Linux | Mac | Windows | |
---|---|---|---|
x86_64 | yes | no | no |
aarch64 | yes | yes | n/a |
To cite this repository:
@software{grain2023github,
author = {Marvin Ritter and Ihor Indyk and Aayush Singh and Andrew Audibert and Anoosha Seelam and Camelia Hanes and Eric Lau and Jacek Olesiak and Jiyang Kang and Xihui Wu},
title = {{Grain} - Feeding JAX Models},
url = {http://github.com/google/grain},
version = {0.2.12},
year = {2023},
}
The version number is intended to be that from pyproject.toml, and the year corresponds to the project's open-source release.
Grain is used by MaxText, Gemma, kauldron, maxdiffusion and multiple internal Google projects.