Skip to content

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Jun 4, 2025

We currently evaluate hive predicates using a skip_batch_predicate that was designed for parquet statistics. This is essentially an expression that operates on a DataFrame containing {column}_min, {column}_max columns.

The hive partitions were converted to such a DataFrame, however this is not compatible as skip_batch_predicate interprets NULL as "statistics are missing" and indicates that the data must be read, whereas in hive a NULL indicates that the actual row values are all NULL.

This PR changes it to directly use the original predicate expression. It is more of a quick fix as I am expecting to do larger refactors around predicate handling in the future for iceberg.

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Jun 4, 2025
@nameexhaustion nameexhaustion changed the title fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ (NULL) Jun 4, 2025
@nameexhaustion nameexhaustion changed the title fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ (NULL) fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ Jun 4, 2025
Copy link

codecov bot commented Jun 4, 2025

Codecov Report

Attention: Patch coverage is 80.72289% with 16 lines in your changes missing coverage. Please review.

Project coverage is 80.22%. Comparing base (7768a67) to head (4007be8).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-mem-engine/src/planner/lp.rs 75.92% 13 Missing ⚠️
crates/polars-mem-engine/src/predicate.rs 40.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #23074      +/-   ##
==========================================
- Coverage   80.25%   80.22%   -0.04%     
==========================================
  Files        1684     1684              
  Lines      223961   223945      -16     
  Branches     2808     2808              
==========================================
- Hits       179743   179662      -81     
- Misses      43558    43623      +65     
  Partials      660      660              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nameexhaustion nameexhaustion marked this pull request as ready for review June 4, 2025 12:00
@ritchie46
Copy link
Member

Nice, can we test it somehow?

@ritchie46 ritchie46 merged commit 665a202 into pola-rs:main Jun 5, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Polars 1.30.0 does not filter out nulls in hive-partitioned datasets
2 participants