Skip to content

Conversation

@dmitriyrepin
Copy link
Contributor

@dmitriyrepin dmitriyrepin commented Sep 29, 2025

Add new API

def create_empty(  # noqa PLR0913
    mdio_template: AbstractDatasetTemplate | str,
    dimensions: list[Dimension],
    output_path: UPath | Path | str | None,
    headers: HeaderSpec | None = None,
    overwrite: bool = False,
) -> xr_Dataset:
    """A function that creates an empty MDIO v1 file with known dimensions.

    Args:
        mdio_template: The MDIO template or template name to use to define the dataset structure.
            NOTE: If you want to have a unit-aware MDIO model, you need to add the units
            to the template before calling this function. For example:
            'unit_aware_template = TemplateRegistry().get("PostStack3DTime")'
            'unit_aware_template.add_units({"time": UNITS_SECOND})'
            'unit_aware_template.add_units({"cdp_x": UNITS_METER})'
            'unit_aware_template.add_units({"cdp_y": UNITS_METER})'
            'create_empty(unit_aware_template, dimensions, output_path, headers, overwrite)'
        dimensions: The dimensions of the MDIO file.
        output_path: The universal path for the output MDIO v1 file.
        headers: SEG-Y v1.0 trace headers. Defaults to None.
        overwrite: Whether to overwrite the output file if it already exists. Defaults to False.

    Returns:
        The output MDIO dataset.

    Raises:
        FileExistsError: If the output location already exists and overwrite is False.
    """

Please review tests/integration/test_create_empty.py : test_populate_empty_dataset() to see if it could be used to address

def create_empty_like(  # noqa PLR0913
    input_path: UPath | Path | str,
    output_path: UPath | Path | str,
    keep_coordinates: bool = False,
    overwrite: bool = False,
) -> xr_Dataset:
    """A function that creates an empty MDIO v1 file with the same structure as an existing one.

    Args:
        input_path: The path of the input MDIO file.
        output_path: The path of the output MDIO file.
                     If None, the output will not be written to disk.
        keep_coordinates: Whether to keep the coordinates in the output file.
        overwrite: Whether to overwrite the output file if it exists.

    Returns:
        The output MDIO dataset.

    Raises:
        FileExistsError: If the output location already exists and overwrite is False.
    """
    ```

@codecov
Copy link

codecov bot commented Sep 29, 2025

Codecov Report

❌ Patch coverage is 91.48936% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.46%. Comparing base (c3ba558) to head (de35aeb).
⚠️ Report is 207 commits behind head on main.

Files with missing lines Patch % Lines
src/mdio/api/create.py 73.43% 6 Missing and 11 partials ⚠️
tests/integration/test_z_create_empty.py 96.62% 2 Missing and 4 partials ⚠️
src/mdio/converters/__init__.py 72.72% 2 Missing and 1 partial ⚠️
tests/unit/v1/test_dataset_serializer.py 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #687      +/-   ##
==========================================
+ Coverage   85.30%   87.46%   +2.15%     
==========================================
  Files          46       91      +45     
  Lines        2219     5115    +2896     
  Branches      306      346      +40     
==========================================
+ Hits         1893     4474    +2581     
- Misses        281      553     +272     
- Partials       45       88      +43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tasansal
Copy link
Collaborator

Thanks Dmitry. We should make it less specific to SEGY. I propose the following design:

  1. Users need to use or register a new template
  2. create_empty takes just the template and grid, no segy_spec.
  3. move this out of converters, have its own creation module

Additionally, the tutorial notebook needs to be updated

@BrianMichell
Copy link
Collaborator

To add some further context

  1. Headers and live mask are not mandatory for MDIO V1. This is a change that should be noted in the documentation. Additionally, since these Variables are not required we should notate this in the Write to SEG-Y section.
  2. The copy_empty_like has not been deeply discussed. Please create an issue for this feature and link it in the documentation. We will discuss the appropriate implementation but feel free to propose the appropriate behavior.

@dmitriyrepin
Copy link
Contributor Author

@tasansal, @BrianMichell
May I suggest the following updated API:

def create_empty_mdio(  # noqa PLR0913
    mdio_template: AbstractDatasetTemplate,
    dimensions: list[Dimension],
    output_path: UPath | Path | str,
    create_headers: bool = False,
    overwrite: bool = False,
) -> None:
    """A function that creates an empty MDIO v1 file with known dimensions.

    Args:
        mdio_template: The MDIO template to use to define the dataset structure.
        dimensions: The dimensions of the MDIO file.
        output_path: The universal path for the output MDIO v1 file.
        create_headers: Whether to create a full set of SEG-Y v1.0 trace headers. Defaults to False.
        overwrite: Whether to overwrite the output file if it already exists. Defaults to False.

    Raises:
        FileExistsError: If the output location already exists and overwrite is False.
    """

Copy link
Collaborator

@BrianMichell BrianMichell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the current implementation strikes a happy medium between our discussions

@dmitriyrepin dmitriyrepin changed the title create empty create empty and create_empty_like Oct 27, 2025
@tasansal tasansal mentioned this pull request Oct 28, 2025
@tasansal tasansal added the enhancement New feature or request label Oct 28, 2025
@tasansal tasansal requested a review from BrianMichell October 28, 2025 19:07
Copy link
Collaborator

@BrianMichell BrianMichell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teapod should be fixed to teapot. I think I caught all instance but please double check.

It looks like there was some scope creep in this PR which started updating tests like the test_segy_roundtrip_teapot source file. If you could please remove these and open a separate PR to keep changes limited in scope and reviews managable.

assert not garbage_file.exists(), "Garbage file should have been overwritten"
assert not garbage_dir.exists(), "Garbage directory should have been overwritten"

def test_populate_empty_dataset(self, mdio_with_headers: Path) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the purpose of this as a test. If anything a more detailed docs page in the form of a jupyter notebook would be more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intended purpose of the test is ensure that the structure of the created empty dataset is sufficient to allow the user to perform all the dataset population steps he/she might fancy. If/as the API evolves, the test allows to immediately update the example code to reflect the API changes.
Since jupyter notebook are not executed with every PR submission, they might become out of sync with the API (as any documentation tends to do). A purely hypothetical example: if we decide to change FEET_PER_SECOND to FOOT_PER_SECOND, the test will be properly updated as a part of the corresponding PR, while the jupyter notebook will be updated only after the user complains that it can't run any longer.

Also, please note that jupyter is not installed as a part of the standard/extended TGSAI/mdio-python environment. So, formally and pure hypothetically, we should not expect that our users use jupyter notebooks. After all, we do not test if TGSAI/mdio-python is compatible with jupyter.

Please advise, if you agree with the argument above or if you still prefer the test to be removed.

@dmitriyrepin
Copy link
Contributor Author

It looks like there was some scope creep in this PR which started updating tests like the
test_segy_roundtrip_teapot source file. If you could please remove these and open a separate
PR to keep changes limited in scope and reviews manageable.

In order to avoid code duplication, I oppted

  • To expose get_teapot_segy_spec to other tests (create_empty and create_empty_like).
  • Used teapot MDIO as input to the create_empty_like tests.
    • This revealed that multiple tests were writing to zarr_tmp overriding the Teapot.mdio. I tried to give every collection of tests its own temporary mdio file ( zarr_tmp, mdio_4d_tmp, and teapot_mdio_tmp)

Please advise, if I should make create_empty tests suite independent of test_segy_roundtrip_teapod.py by copying some of the roundtrip_teapod setup code to avoid the problems with the above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants