-
-
Couldn't load subscription status.
- Fork 1.2k
Description
Is your feature request related to a problem?
Unable to create easy, good static splits for train val and test sets of an xarray dataset for Deep Learning
Describe the solution you'd like
Is your feature request related to a problem?
Hi everyone,
I’d like to open the discussion about adding functionality to create static spatiotemporal train, validation, and test splits from a single large xarray dataset (e.g., global-scale data).
The goal is to generate these splits based on:
The spatial and temporal size of each sample
The stride between samples
A list of user-defined validation and test regions (as static spatiotemporal holdouts)
Potentially a land mask, used to ignore samples which only contain ocean, or land data.
The expected output would be, for each split, a list of dictionaries containing the coordinates (start and end) of each slice. This structure would make it straightforward to iterate over samples in a dataloader.
Keeping the validation and test regions static is important to ensure consistent model evaluation and comparability across experiments.
I am already working on a prototype for this with my work, and wanted to open the discussion to gather feedback and seeing if and to which library this could be a contribution!
Describe alternatives you've considered
none, it doesnt exist yet
Additional context
No response