Skip to content

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Aug 5, 2025

This is something that needs to be done to be compliant with the spec, but some parts of the implementation are less than ideal due to some constraints:

  • Since we rely on PyIceberg to load the manifest files, we end up ingesting the partition values as Python objects that are loaded by PyIceberg (we finish the load using pl.Series(values_list)).
    • Ideally, we would want to directly load the arrow arrays stored in the manifest file.
  • We also end up building the full list[<python object>] for each projected partition field. This is unfortunate but needed due to:
    • We cannot know ahead of time if any partition field can be missing from the files.
    • The IR needs the entire list to be resolved upfront.

Implementation

  • [Python] Partition values are loaded from PyIceberg via IdentityTransformedPartitionValuesBuilder
  • [Rust] Default values are inserted via ColumnSelectors

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Aug 5, 2025
Copy link

codecov bot commented Aug 5, 2025

Codecov Report

❌ Patch coverage is 75.31646% with 78 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.41%. Comparing base (c95d0db) to head (c22c6a1).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...o_sources/multi_scan/components/column_selector.rs 66.07% 19 Missing ⚠️
py-polars/polars/io/iceberg/_utils.py 77.04% 10 Missing and 4 partials ⚠️
crates/polars-core/src/frame/column/scalar.rs 0.00% 9 Missing ⚠️
...es/polars-plan/src/dsl/file_scan/default_values.rs 27.27% 8 Missing ⚠️
crates/polars-python/src/conversion/mod.rs 76.00% 6 Missing ⚠️
...io_sources/multi_scan/components/projection/mod.rs 82.35% 6 Missing ⚠️
...rces/multi_scan/components/default_field_values.rs 73.68% 5 Missing ⚠️
...ources/multi_scan/pipeline/tasks/reader_starter.rs 76.19% 5 Missing ⚠️
py-polars/polars/io/iceberg/dataset.py 81.25% 2 Missing and 1 partial ⚠️
...ources/multi_scan/components/projection/builder.rs 92.85% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #23900      +/-   ##
==========================================
- Coverage   81.41%   81.41%   -0.01%     
==========================================
  Files        1665     1668       +3     
  Lines      224714   225037     +323     
  Branches     2883     2896      +13     
==========================================
+ Hits       182949   183203     +254     
- Misses      41047    41111      +64     
- Partials      718      723       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nameexhaustion nameexhaustion force-pushed the iceberg-missing-column-identity-partition branch 3 times, most recently from bcc6e84 to e3f95df Compare August 5, 2025 09:26
@nameexhaustion nameexhaustion force-pushed the iceberg-missing-column-identity-partition branch from e3f95df to da5b29c Compare August 5, 2025 09:55
@nameexhaustion nameexhaustion force-pushed the iceberg-missing-column-identity-partition branch from da5b29c to c22c6a1 Compare August 5, 2025 10:03
@nameexhaustion nameexhaustion marked this pull request as ready for review August 5, 2025 10:46
@coastalwhite coastalwhite merged commit 1bc7ace into pola-rs:main Aug 5, 2025
29 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iceberg missing columns are not filled with identity transformed partition values
2 participants