-
Notifications
You must be signed in to change notification settings - Fork 15
create empty and create_empty_like #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
46cfc1d
create empty
dmitriyrepin 8fbc127
Add test_overwrite_behavior
dmitriyrepin 6d1fcea
Fix pre-commit
dmitriyrepin 687dcc8
Merge remote-tracking branch 'upstream' into create_empty
dmitriyrepin 40bd1da
Update API
dmitriyrepin c73a7ed
move create_empty_mdio
dmitriyrepin 5e36333
Revert TMP -> tmp change
dmitriyrepin 7d0b562
Pre-commit formatting
dmitriyrepin 56aecf6
_create_empty_mdio with template_name
dmitriyrepin eab14a7
Merge 'upstream/main'
dmitriyrepin b7f3c40
pre-commit
dmitriyrepin 6c3a03b
Merge branch 'main' into create_empty
BrianMichell 94fd3fe
PR review and test_populate_empty_dataset
dmitriyrepin 67adce2
Merge 'upstream/main' into create_empty
dmitriyrepin 67a03ec
Merge 'origin/create_empty' into create_empty
dmitriyrepin bf2e41f
USe headers: HeaderSpec
dmitriyrepin e1e3ce2
Add export to segy to test_populate_empty_dataset
dmitriyrepin b8d12cf
Merge remote-tracking branch 'upstream/main' into create_empty
dmitriyrepin 903d78a
Update for upstream chnages
dmitriyrepin 344cf65
Pre-commit added empty line
dmitriyrepin 83b7717
Use Teapod dimensions
dmitriyrepin 1072b98
Merge branch 'main' into create_empty
BrianMichell 7a65c07
Merge remote-tracking branch 'upstream/main' into create_empty
dmitriyrepin 251e2f6
Merge upstream/main' into create_empty
dmitriyrepin c7091a8
Merge branch 'oriugin/create_empty'
dmitriyrepin 325466f
move creators/mdio.py : create_empty() to api/create.py: create_empty(
dmitriyrepin 1a95f82
create_empty_like
dmitriyrepin 26d2c26
Add stats to validate_xr_variable
dmitriyrepin bade123
fix white space change failure of pre-commit
dmitriyrepin baa3da3
remove tmp_path_factory
dmitriyrepin d974125
Ensure test order: create_empty after teapod_roundtrip
dmitriyrepin 2d6e250
Fir createdOn in create_empty_like
dmitriyrepin 1637396
Return xr_dataset from create_empty
dmitriyrepin cc1f6d7
Fix pre-commit
dmitriyrepin 5d2ef41
Merge branch 'main' into create_empty
BrianMichell a9128cb
Merge upstream/main'
dmitriyrepin 1da04f9
Adjust to the breaking changes in upstream/main
dmitriyrepin de35aeb
Merge branch 'origin/create_empty'
dmitriyrepin 62e9be7
Address some of PR review comments
dmitriyrepin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,6 @@ | ||
| """Public API.""" | ||
|
|
||
| from mdio.api.create import create_empty | ||
| from mdio.api.create import create_empty_like | ||
|
|
||
| __all__ = ["create_empty", "create_empty_like"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,168 @@ | ||
| """Creating MDIO v1 datasets.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from datetime import UTC | ||
| from datetime import datetime | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from mdio.api.io import _normalize_path | ||
| from mdio.api.io import open_mdio | ||
| from mdio.api.io import to_mdio | ||
| from mdio.builder.template_registry import TemplateRegistry | ||
| from mdio.builder.xarray_builder import to_xarray_dataset | ||
| from mdio.converters.segy import populate_dim_coordinates | ||
| from mdio.converters.type_converter import to_structured_type | ||
| from mdio.core.grid import Grid | ||
|
|
||
| if TYPE_CHECKING: | ||
| from pathlib import Path | ||
|
|
||
| from segy.schema import HeaderSpec | ||
| from upath import UPath | ||
| from xarray import Dataset as xr_Dataset | ||
|
|
||
| from mdio.builder.schemas import Dataset | ||
| from mdio.builder.templates.base import AbstractDatasetTemplate | ||
| from mdio.core.dimension import Dimension | ||
|
|
||
|
|
||
| def create_empty( # noqa PLR0913 | ||
| mdio_template: AbstractDatasetTemplate | str, | ||
| dimensions: list[Dimension], | ||
| output_path: UPath | Path | str | None, | ||
| headers: HeaderSpec | None = None, | ||
| overwrite: bool = False, | ||
| ) -> xr_Dataset: | ||
| """A function that creates an empty MDIO v1 file with known dimensions. | ||
|
|
||
| Args: | ||
| mdio_template: The MDIO template or template name to use to define the dataset structure. | ||
| dimensions: The dimensions of the MDIO file. | ||
| output_path: The universal path for the output MDIO v1 file. | ||
| headers: SEG-Y v1.0 trace headers. Defaults to None. | ||
| overwrite: Whether to overwrite the output file if it already exists. Defaults to False. | ||
|
|
||
| Returns: | ||
| The output MDIO dataset. | ||
|
|
||
| Raises: | ||
| FileExistsError: If the output location already exists and overwrite is False. | ||
| """ | ||
| output_path = _normalize_path(output_path) | ||
|
|
||
| if not overwrite and output_path.exists(): | ||
| err = f"Output location '{output_path.as_posix()}' exists. Set `overwrite=True` if intended." | ||
| raise FileExistsError(err) | ||
|
|
||
| header_dtype = to_structured_type(headers.dtype) if headers else None | ||
| grid = Grid(dims=dimensions) | ||
| if isinstance(mdio_template, str): | ||
| # A template name is passed in. Get a unit-unaware template from registry | ||
| mdio_template = TemplateRegistry().get(mdio_template) | ||
| # Build the dataset using the template | ||
| mdio_ds: Dataset = mdio_template.build_dataset(name=mdio_template.name, sizes=grid.shape, header_dtype=header_dtype) | ||
|
|
||
| # Convert to xarray dataset | ||
| xr_dataset: xr_Dataset = to_xarray_dataset(mdio_ds=mdio_ds) | ||
|
|
||
| # Populate coordinates using the grid | ||
| # For empty datasets, we only populate dimension coordinates | ||
| drop_vars_delayed = [] | ||
| xr_dataset, drop_vars_delayed = populate_dim_coordinates(xr_dataset, grid, drop_vars_delayed=drop_vars_delayed) | ||
|
|
||
| if headers: | ||
| # Since the headers were provided, the user wants to export to SEG-Y | ||
| # Add a dummy segy_file_header variable used to export to SEG-Y | ||
| xr_dataset["segy_file_header"] = ((), "") | ||
|
|
||
| # Create the Zarr store with the correct structure but with empty arrays | ||
| if output_path is not None: | ||
| to_mdio(xr_dataset, output_path=output_path, mode="w", compute=False) | ||
|
|
||
| # Write the dimension coordinates and trace mask | ||
| xr_dataset = xr_dataset[drop_vars_delayed + ["trace_mask"]] | ||
|
|
||
| if output_path is not None: | ||
dmitriyrepin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| to_mdio(xr_dataset, output_path=output_path, mode="r+", compute=True) | ||
|
|
||
| return xr_dataset | ||
|
|
||
|
|
||
| def create_empty_like( # noqa PLR0913 | ||
| input_path: UPath | Path | str, | ||
| output_path: UPath | Path | str, | ||
| keep_coordinates: bool = False, | ||
| overwrite: bool = False, | ||
| ) -> xr_Dataset: | ||
| """A function that creates an empty MDIO v1 file with the same structure as an existing one. | ||
|
|
||
| Args: | ||
| input_path: The path of the input MDIO file. | ||
| output_path: The path of the output MDIO file. | ||
| If None, the output will not be written to disk. | ||
| keep_coordinates: Whether to keep the coordinates in the output file. | ||
| overwrite: Whether to overwrite the output file if it exists. | ||
|
|
||
| Returns: | ||
| The output MDIO dataset. | ||
|
|
||
| Raises: | ||
| FileExistsError: If the output location already exists and overwrite is False. | ||
| """ | ||
| input_path = _normalize_path(input_path) | ||
| output_path = _normalize_path(output_path) if output_path is not None else None | ||
|
|
||
| if not overwrite and output_path is not None and output_path.exists(): | ||
| err = f"Output location '{output_path.as_posix()}' exists. Set `overwrite=True` if intended." | ||
| raise FileExistsError(err) | ||
|
|
||
| ds = open_mdio(input_path) | ||
|
|
||
| # Create a copy with the same structure but no data or, | ||
| # optionally, coordinates | ||
| ds_output = ds.copy(data=None).reset_coords(drop=not keep_coordinates) | ||
|
|
||
| # Dataset | ||
| # Keep the name (which is the same as the used template name) and the original API version | ||
| # ds_output.attrs["name"] | ||
| # ds_output.attrs["apiVersion"] | ||
dmitriyrepin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ds_output.attrs["createdOn"] = str(datetime.now(UTC)) | ||
|
|
||
| # Coordinates | ||
| if not keep_coordinates: | ||
| for coord_name in ds_output.coords: | ||
| ds_output[coord_name].attrs.pop("unitsV1", None) | ||
|
|
||
| # MDIO attributes | ||
| attr = ds_output.attrs["attributes"] | ||
| if attr is not None: | ||
| attr.pop("gridOverrides", None) # Empty dataset should not have gridOverrides | ||
| # Keep the original values for the following attributes | ||
| # attr["defaultVariableName"] | ||
| # attr["surveyType"] | ||
| # attr["gatherType"] | ||
|
|
||
| # "All traces should be marked as dead in empty dataset" | ||
| if "trace_mask" in ds_output.variables: | ||
| ds_output["trace_mask"][:] = False | ||
|
|
||
| # Data variable | ||
| var_name = attr["defaultVariableName"] | ||
dmitriyrepin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| var = ds_output[var_name] | ||
| var.attrs.pop("statsV1", None) | ||
| if not keep_coordinates: | ||
| var.attrs.pop("unitsV1", None) | ||
|
|
||
| # SEG-Y file header | ||
| if "segy_file_header" in ds_output.variables: | ||
| segy_file_header = ds_output["segy_file_header"] | ||
| if segy_file_header is not None: | ||
| segy_file_header.attrs.pop("textHeader", None) | ||
| segy_file_header.attrs.pop("binaryHeader", None) | ||
| segy_file_header.attrs.pop("rawBinaryHeader", None) | ||
|
|
||
| if output_path is not None: | ||
| to_mdio(ds_output, output_path=output_path, mode="w", compute=True) | ||
|
|
||
| return ds_output | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,30 @@ | ||
| """MDIO Data conversion API.""" | ||
|
|
||
| from mdio.converters.mdio import mdio_to_segy | ||
| from mdio.converters.segy import segy_to_mdio | ||
| from typing import TYPE_CHECKING | ||
| from typing import Any | ||
|
|
||
| if TYPE_CHECKING: | ||
| from mdio.converters.mdio import mdio_to_segy | ||
| from mdio.converters.segy import segy_to_mdio | ||
|
|
||
| __all__ = ["mdio_to_segy", "segy_to_mdio"] | ||
|
|
||
|
|
||
| def __getattr__(name: str) -> Any: # noqa: ANN401 - required for dynamic attribute access | ||
| """Lazy import for converters to avoid circular imports.""" | ||
| if name == "mdio_to_segy": | ||
| from mdio.converters.mdio import ( # noqa: PLC0415 - intentionally inside the function to avoid circular imports | ||
| mdio_to_segy, | ||
| ) | ||
|
|
||
| return mdio_to_segy | ||
|
|
||
| if name == "segy_to_mdio": | ||
| from mdio.converters.segy import ( # noqa: PLC0415 - intentionally inside the function to avoid circular imports | ||
| segy_to_mdio, | ||
| ) | ||
|
|
||
| return segy_to_mdio | ||
|
|
||
| err = f"module {__name__!r} has no attribute {name!r}" | ||
| raise AttributeError(err) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.