Skip to content

static spatiotemporal train/val/test splits for dataset #10853

@CoenvdE

Description

@CoenvdE

Is your feature request related to a problem?

Unable to create easy, good static splits for train val and test sets of an xarray dataset for Deep Learning

Describe the solution you'd like

Is your feature request related to a problem?
Hi everyone,

I’d like to open the discussion about adding functionality to create static spatiotemporal train, validation, and test splits from a single large xarray dataset (e.g., global-scale data).

The goal is to generate these splits based on:

The spatial and temporal size of each sample
The stride between samples
A list of user-defined validation and test regions (as static spatiotemporal holdouts)
Potentially a land mask, used to ignore samples which only contain ocean, or land data.
The expected output would be, for each split, a list of dictionaries containing the coordinates (start and end) of each slice. This structure would make it straightforward to iterate over samples in a dataloader.

Keeping the validation and test regions static is important to ensure consistent model evaluation and comparability across experiments.
I am already working on a prototype for this with my work, and wanted to open the discussion to gather feedback and seeing if and to which library this could be a contribution!

Describe alternatives you've considered

none, it doesnt exist yet

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions