Skip to content

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Sep 10, 2022

Initial attempt to get cubed working within xarray, as an alternative to dask.

I've added a manager kwarg to the .chunk methods so you can do da.chunk(manager="cubed") to convert to a chunked cubed.CoreArray, with the default still being da.chunk(manager="dask"). (I couldn't think of a better name than "manager", as "backend" and "executor" are already taken.)

At the moment it should work except for an import error that I don't understand, see below.

Fro cubed to work at all with this PR we would also need:

To-dos for me on this PR:

  • Re-route xarray.apply_ufunc through cubed.apply_gufunc instead of dask's apply_gufunc when appropriate,
  • Add from_array_kwargs to opening functions, e.g. open_zarr, and open_dataset,
  • Add from_array_kwargs to creation functions, such as full_like,
  • Add store_kwargs as a way to propagate cubed-specific kwargs when saving to_zarr.

To complete this project more generally we should also:

cc @tomwhite

@TomNicholas TomNicholas added topic-arrays related to flexible array support and removed topic-typing labels Sep 10, 2022
@github-actions github-actions bot added topic-typing and removed topic-arrays related to flexible array support labels Sep 10, 2022
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@TomNicholas
Copy link
Member Author

@Illviljan thanks for all your comments!

Would you (or @keewis?) be willing to approve this PR now? I would really like to merge this so that I can release a version of xarray that I can use as a dependency for cubed-xarray.

@dcherian
Copy link
Contributor

Thanks @TomNicholas Big change!

@dcherian dcherian merged commit 6cd6122 into pydata:main May 18, 2023
@TomNicholas
Copy link
Member Author

Woooo thanks @dcherian !

@jhamman
Copy link
Member

jhamman commented May 18, 2023

👏 Congrats @TomNicholas on getting this in! Such an important contribution. 👏

@TomNicholas TomNicholas mentioned this pull request May 18, 2023
4 tasks
@tomwhite
Copy link
Contributor

Thanks for all your hard work on this @TomNicholas!

@TomNicholas TomNicholas mentioned this pull request Jun 8, 2023
4 tasks
dstansby pushed a commit to dstansby/xarray that referenced this pull request Jun 28, 2023
* generalise chunk methods to allow cubed

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fic typing typo

* fixed circular import

* fix some mypy errors

* added cubed to mypy ignore list

* simplify __array_ufunc__ check

* Revert "simplify __array_ufunc__ check" as I pushed to wrong branch

This reverts commit cdcb3fb.

* update cubed array type

* fix missed conflict

* sketch for ChunkManager adapter class

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove erroneous docstring about usage of map_blocks

Co-authored-by: Deepak Cherian <[email protected]>

* apply_ufunc -> apply_gufunc

Co-authored-by: Deepak Cherian <[email protected]>

* chunk -> from_array

Co-authored-by: Deepak Cherian <[email protected]>

* remove staticmethods

* attempt to type methods of ABC

* from_array

* attempt to specify types

* method for checking array type

* Update pyproject.toml

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed import errors

* generalize .chunk method kwargs

* used dask functions in dask chunkmanager

* define signatures for apply_gufunc, blockwise, map_blocks

* prototype function to detect which parallel backend to use

* add cubed.apply_gufunc

* ruffify

* add rechunk and compute methods for cubed

* xr.apply_ufunc now dispatches to chunkmanager.apply_gufunc

* CubedManager.chunks

* attempt to keep dask and cubed imports lazy

* generalize idxmax

* move unify_chunks import to ChunkManager

* generalize Dataset.load()

* check explicitly for chunks attribute instead of hard-coding cubed

* better function names

* add cubed version of unify_chunks

* recognize wrapped duck dask arrays (e.g. pint wrapping dask)

* add some tests for fetching ChunkManagers

* add from_array_kwargs to open_dataset

* add from_array_kwargs to open_zarr

* pipe constructors through chunkmanager

* generalize map_blocks inside coding

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed full_like

* add from_array_kwargs to open_zarr

* don't import dask.tokenize

* fix bugs with passing from_array_kwargs down

* generalise reductions by adding to chunkmanager

* moved nanfirst/nanlast to duck_array_ops from dask_array_ops

* generalize interp

* generalized chunk_hint function inside indexing

* DaskIndexingAdapter->ChunkedIndexingAdapter

* Revert "DaskIndexingAdapter->ChunkedIndexingAdapter"

This reverts commit 4ca044b.

* pass cubed-related kwargs down through to_zarr by adding .store to ChunkManager

* fix typing_extensions on py3.9

* fix ImportError with cubed array type

* give up trying to import TypeAlias in CI

* fix import of T_Chunks

* fix no_implicit_optional warnings

* don't define CubedManager if cubed can't be imported

* fix local mypy errors

* don't explicitly pass enforce_ndim into dask.array.map_blocks

* fix drop_axis default

* use indexing adapter on cubed arrays too

* use array API-compatible version of astype function

* whatsnew

* document new kwargs

* add chunkmanager entrypoint

* move CubedManager to a separate package

* guess chunkmanager based on whats available

* fix bug with tokenizing

* adapt tests to emulate existence of entrypoint

* use fixture to setup/teardown dummy entrypoint

* refactor to make DaskManager unavailable if dask not installed

* typing

* move whatsnew to latest xarray version

* remove superfluous lines from whatsnew

* fix bug where zarr backend attempted to use dask when not installed

* Remove rogue print statement

Co-authored-by: Deepak Cherian <[email protected]>

* Clarify what's new

Co-authored-by: Deepak Cherian <[email protected]>

* use monkeypatch to mock registering of dummy chunkmanager

* more tests for guessing chunkmanager correctly

* raise TypeError if no chunkmanager found for array types

* Correct is_chunked_array check

Co-authored-by: Deepak Cherian <[email protected]>

* vendor dask.array.core.normalize_chunks

* add default implementation of rechunk in ABC

* remove cubed-specific type check in daskmanager

* nanfirst->chunked_nanfirst

* revert adding cubed to NON_NUMPY_SUPPORTED_ARRAY_TYPES

* licensing to vendor functions from dask

* fix bug

* ignore mypy error

* separate chunk_manager kwarg from from_array_kwargs dict

* rename kwarg to chunked_array_type

* refactor from_array_kwargs in .chunk ready for deprecation

* print statements in test so I can comment on them

* remove print statements now I've commented on them in PR

* should fix dask naming tests

* make dask-specific kwargs explicit in from_array

* debugging print statements

* Revert "debugging print statements"

This reverts commit 7dc6581.

* fix gnarly bug with auto-determining chunksizes caused by not referring to dask.config

* hopefully fix broken docstring

* Revert "make dask-specific kwargs explicit in from_array"

This reverts commit 53d6094.

* show chunksize limit used in failing tests

* move lazy indexing adapter up out of chunkmanager code

* try upgrading minimum version of dask

* Revert "try upgrading minimum version of dask"

This reverts commit 796a577.

* un-vendor dask.array.core.normalize_chunks

* refactored to all passing ChunkManagerEntrypoint objects directly

* Remove redundant Nones from types

Co-authored-by: Illviljan <[email protected]>

* From future import annotations

Co-authored-by: Illviljan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* From functools import annotations

Co-authored-by: Illviljan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* From future import annotations

Co-authored-by: Illviljan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* defined type for NormalizedChunks

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* standardized capitalization of ChunkManagerEntrypoint

* ensure ruff doesn't remove import

* ignore remaining typing errors stemming from unclear dask typing for chunks arguments

* rename store_kwargs->chunkmanager_store_kwargs

* missed return value

* array API fixes for astype

* Revert "array API fixes for astype"

This reverts commit 9cd9078.

* Apply suggestions from code review

* Update xarray/tests/test_parallelcompat.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* overridden -> subclassed

Co-authored-by: Justus Magin <[email protected]>

* from_array_kwargs is optional

Co-authored-by: Justus Magin <[email protected]>

* ensured all compute calls go through chunkmanager

* Raise if multiple chunkmanagers recognize array type

Co-authored-by: Justus Magin <[email protected]>

* from_array_kwargs is optional

Co-authored-by: Justus Magin <[email protected]>

* from_array_kwargs is optional

Co-authored-by: Justus Magin <[email protected]>

* from_array_kwargs is optional

Co-authored-by: Justus Magin <[email protected]>

* from_array_kwargs is optional

Co-authored-by: Justus Magin <[email protected]>

* from_array_kwargs is optional

Co-authored-by: Justus Magin <[email protected]>

* fixes for chunk methods

* correct readme to reflect fact we aren't vendoring dask in this PR any more

* update whatsnew

* more docstring corrections

* remove comment

* Raise NotImplementedErrors in all abstract methods

Co-authored-by: Illviljan <[email protected]>

* type hints for every arg in ChunkManagerEntryPOint methods

* more explicit typing + fixes for mypy errors revealed

* Keyword-only arguments in full_like etc.

Co-authored-by: Illviljan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* None as default instead of {}

* fix bug apparently introduced by changing default type of drop_axis kwarg to map_blocks

* Removed hopefully-unnecessary mypy ignore

Co-authored-by: Illviljan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unnecessary mypy ignores

* change default value of drop_axis kwarg in map_blocks and catch when dask version < 2022.9.1

* fix checking of dask version in map_blocks

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Illviljan <[email protected]>
Co-authored-by: Justus Magin <[email protected]>
@TomNicholas TomNicholas mentioned this pull request Jun 28, 2023
4 tasks
@TomNicholas TomNicholas deleted the cubed_integration branch July 24, 2023 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design CI Continuous Integration tools dependencies Pull requests that update a dependency file enhancement io needs review run-benchmark Run the ASV benchmark workflow topic-arrays related to flexible array support topic-backends topic-CF conventions topic-dask topic-indexing topic-rolling topic-typing topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alternative parallel execution frameworks in xarray
9 participants