create empty and create_empty_like #687

dmitriyrepin · 2025-09-29T16:29:43Z

Add new API

def create_empty(  # noqa PLR0913
    mdio_template: AbstractDatasetTemplate | str,
    dimensions: list[Dimension],
    output_path: UPath | Path | str | None,
    headers: HeaderSpec | None = None,
    overwrite: bool = False,
) -> xr_Dataset:
    """A function that creates an empty MDIO v1 file with known dimensions.

    Args:
        mdio_template: The MDIO template or template name to use to define the dataset structure.
            NOTE: If you want to have a unit-aware MDIO model, you need to add the units
            to the template before calling this function. For example:
            'unit_aware_template = TemplateRegistry().get("PostStack3DTime")'
            'unit_aware_template.add_units({"time": UNITS_SECOND})'
            'unit_aware_template.add_units({"cdp_x": UNITS_METER})'
            'unit_aware_template.add_units({"cdp_y": UNITS_METER})'
            'create_empty(unit_aware_template, dimensions, output_path, headers, overwrite)'
        dimensions: The dimensions of the MDIO file.
        output_path: The universal path for the output MDIO v1 file.
        headers: SEG-Y v1.0 trace headers. Defaults to None.
        overwrite: Whether to overwrite the output file if it already exists. Defaults to False.

    Returns:
        The output MDIO dataset.

    Raises:
        FileExistsError: If the output location already exists and overwrite is False.
    """

Please review tests/integration/test_create_empty.py : test_populate_empty_dataset() to see if it could be used to address

690 Update jupyter notebook describing how to create and populate empty MDIO v1 files

def create_empty_like(  # noqa PLR0913
    input_path: UPath | Path | str,
    output_path: UPath | Path | str,
    keep_coordinates: bool = False,
    overwrite: bool = False,
) -> xr_Dataset:
    """A function that creates an empty MDIO v1 file with the same structure as an existing one.

    Args:
        input_path: The path of the input MDIO file.
        output_path: The path of the output MDIO file.
                     If None, the output will not be written to disk.
        keep_coordinates: Whether to keep the coordinates in the output file.
        overwrite: Whether to overwrite the output file if it exists.

    Returns:
        The output MDIO dataset.

    Raises:
        FileExistsError: If the output location already exists and overwrite is False.
    """
    ```

codecov · 2025-09-29T16:49:25Z

Codecov Report

❌ Patch coverage is 91.48936% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.46%. Comparing base (c3ba558) to head (de35aeb).
⚠️ Report is 207 commits behind head on main.

Files with missing lines	Patch %	Lines
src/mdio/api/create.py	73.43%	6 Missing and 11 partials ⚠️
tests/integration/test_z_create_empty.py	96.62%	2 Missing and 4 partials ⚠️
src/mdio/converters/__init__.py	72.72%	2 Missing and 1 partial ⚠️
tests/unit/v1/test_dataset_serializer.py	33.33%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #687      +/-   ##
==========================================
+ Coverage   85.30%   87.46%   +2.15%     
==========================================
  Files          46       91      +45     
  Lines        2219     5115    +2896     
  Branches      306      346      +40     
==========================================
+ Hits         1893     4474    +2581     
- Misses        281      553     +272     
- Partials       45       88      +43

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tasansal · 2025-09-30T13:54:27Z

Thanks Dmitry. We should make it less specific to SEGY. I propose the following design:

Users need to use or register a new template
create_empty takes just the template and grid, no segy_spec.
move this out of converters, have its own creation module

Additionally, the tutorial notebook needs to be updated

BrianMichell · 2025-09-30T15:20:10Z

To add some further context

Headers and live mask are not mandatory for MDIO V1. This is a change that should be noted in the documentation. Additionally, since these Variables are not required we should notate this in the Write to SEG-Y section.
The copy_empty_like has not been deeply discussed. Please create an issue for this feature and link it in the documentation. We will discuss the appropriate implementation but feel free to propose the appropriate behavior.

dmitriyrepin · 2025-09-30T15:50:48Z

@tasansal, @BrianMichell
May I suggest the following updated API:

def create_empty_mdio(  # noqa PLR0913
    mdio_template: AbstractDatasetTemplate,
    dimensions: list[Dimension],
    output_path: UPath | Path | str,
    create_headers: bool = False,
    overwrite: bool = False,
) -> None:
    """A function that creates an empty MDIO v1 file with known dimensions.

    Args:
        mdio_template: The MDIO template to use to define the dataset structure.
        dimensions: The dimensions of the MDIO file.
        output_path: The universal path for the output MDIO v1 file.
        create_headers: Whether to create a full set of SEG-Y v1.0 trace headers. Defaults to False.
        overwrite: Whether to overwrite the output file if it already exists. Defaults to False.

    Raises:
        FileExistsError: If the output location already exists and overwrite is False.
    """

src/mdio/converters/segy.py

src/mdio/creators/__init__.py

src/mdio/creators/mdio.py

src/mdio/api/create.py

BrianMichell

I think that the current implementation strikes a happy medium between our discussions

BrianMichell

teapod should be fixed to teapot. I think I caught all instance but please double check.

It looks like there was some scope creep in this PR which started updating tests like the test_segy_roundtrip_teapot source file. If you could please remove these and open a separate PR to keep changes limited in scope and reviews managable.

src/mdio/api/create.py

tests/integration/test_segy_roundtrip_teapot.py

tests/integration/test_z_create_empty.py

BrianMichell · 2025-10-29T15:08:12Z

tests/integration/test_z_create_empty.py

+        assert not garbage_file.exists(), "Garbage file should have been overwritten"
+        assert not garbage_dir.exists(), "Garbage directory should have been overwritten"
+
+    def test_populate_empty_dataset(self, mdio_with_headers: Path) -> None:


I don't see the purpose of this as a test. If anything a more detailed docs page in the form of a jupyter notebook would be more appropriate.

The intended purpose of the test is ensure that the structure of the created empty dataset is sufficient to allow the user to perform all the dataset population steps he/she might fancy. If/as the API evolves, the test allows to immediately update the example code to reflect the API changes.
Since jupyter notebook are not executed with every PR submission, they might become out of sync with the API (as any documentation tends to do). A purely hypothetical example: if we decide to change FEET_PER_SECOND to FOOT_PER_SECOND, the test will be properly updated as a part of the corresponding PR, while the jupyter notebook will be updated only after the user complains that it can't run any longer.

Also, please note that jupyter is not installed as a part of the standard/extended TGSAI/mdio-python environment. So, formally and pure hypothetically, we should not expect that our users use jupyter notebooks. After all, we do not test if TGSAI/mdio-python is compatible with jupyter.

Please advise, if you agree with the argument above or if you still prefer the test to be removed.

dmitriyrepin · 2025-10-29T16:42:53Z

It looks like there was some scope creep in this PR which started updating tests like the
test_segy_roundtrip_teapot source file. If you could please remove these and open a separate
PR to keep changes limited in scope and reviews manageable.

In order to avoid code duplication, I oppted

To expose get_teapot_segy_spec to other tests (create_empty and create_empty_like).
Used teapot MDIO as input to the create_empty_like tests.
- This revealed that multiple tests were writing to zarr_tmp overriding the Teapot.mdio. I tried to give every collection of tests its own temporary mdio file ( zarr_tmp, mdio_4d_tmp, and teapot_mdio_tmp)

Please advise, if I should make create_empty tests suite independent of test_segy_roundtrip_teapod.py by copying some of the roundtrip_teapod setup code to avoid the problems with the above.

dmitriyrepin added 3 commits September 29, 2025 16:27

create empty

46cfc1d

Add test_overwrite_behavior

8fbc127

Fix pre-commit

6d1fcea

dmitriyrepin and others added 9 commits September 30, 2025 16:38

Merge remote-tracking branch 'upstream' into create_empty

687dcc8

Update API

40bd1da

move create_empty_mdio

c73a7ed

Revert TMP -> tmp change

5e36333

Pre-commit formatting

7d0b562

_create_empty_mdio with template_name

56aecf6

Merge 'upstream/main'

eab14a7

pre-commit

b7f3c40

Merge branch 'main' into create_empty

6c3a03b

BrianMichell requested changes Oct 1, 2025

View reviewed changes

dmitriyrepin added 4 commits October 2, 2025 16:40

PR review and test_populate_empty_dataset

94fd3fe

Merge 'upstream/main' into create_empty

67adce2

Merge 'origin/create_empty' into create_empty

67a03ec

USe headers: HeaderSpec

bf2e41f

BrianMichell approved these changes Oct 3, 2025

View reviewed changes

dmitriyrepin and others added 8 commits October 3, 2025 22:30

Add export to segy to test_populate_empty_dataset

e1e3ce2

Merge remote-tracking branch 'upstream/main' into create_empty

b8d12cf

Update for upstream chnages

903d78a

Pre-commit added empty line

344cf65

Use Teapod dimensions

83b7717

Merge branch 'main' into create_empty

1072b98

Merge remote-tracking branch 'upstream/main' into create_empty

7a65c07

Merge upstream/main' into create_empty

251e2f6

dmitriyrepin added 3 commits October 27, 2025 19:40

Merge branch 'oriugin/create_empty'

c7091a8

move creators/mdio.py : create_empty() to api/create.py: create_empty(

325466f

create_empty_like

1a95f82

dmitriyrepin changed the title ~~create empty~~ create empty and create_empty_like Oct 27, 2025

dmitriyrepin added 4 commits October 27, 2025 22:43

Add stats to validate_xr_variable

26d2c26

fix white space change failure of pre-commit

bade123

remove tmp_path_factory

baa3da3

Ensure test order: create_empty after teapod_roundtrip

d974125

tasansal mentioned this pull request Oct 28, 2025

create_empty_like #721

Closed

tasansal added the enhancement New feature or request label Oct 28, 2025

tasansal assigned dmitriyrepin Oct 28, 2025

tasansal requested a review from BrianMichell October 28, 2025 19:07

dmitriyrepin and others added 7 commits October 28, 2025 19:12

Fir createdOn in create_empty_like

2d6e250

Return xr_dataset from create_empty

1637396

Fix pre-commit

cc1f6d7

Merge branch 'main' into create_empty

5d2ef41

Merge upstream/main'

a9128cb

Adjust to the breaking changes in upstream/main

1da04f9

Merge branch 'origin/create_empty'

de35aeb

BrianMichell requested changes Oct 29, 2025

View reviewed changes

Address some of PR review comments

62e9be7

dmitriyrepin mentioned this pull request Oct 29, 2025

DRAFT: create empty and create_empty_like isolated #736

Draft

create empty and create_empty_like #687

Are you sure you want to change the base?

create empty and create_empty_like #687

Uh oh!

Conversation

dmitriyrepin commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tasansal commented Sep 30, 2025

Uh oh!

BrianMichell commented Sep 30, 2025

Uh oh!

dmitriyrepin commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BrianMichell left a comment

Choose a reason for hiding this comment

Uh oh!

BrianMichell left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BrianMichell Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

dmitriyrepin Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

dmitriyrepin commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dmitriyrepin commented Sep 29, 2025 •

edited

Loading

codecov bot commented Sep 29, 2025 •

edited

Loading