fix: Fix hive partition pruning not filtering out `__HIVE_DEFAULT_PARTITION__` #23074

nameexhaustion · 2025-06-04T11:13:56Z

Fixes Polars 1.30.0 does not filter out nulls in hive-partitioned datasets #23005

We currently evaluate hive predicates using a skip_batch_predicate that was designed for parquet statistics. This is essentially an expression that operates on a DataFrame containing {column}_min, {column}_max columns.

The hive partitions were converted to such a DataFrame, however this is not compatible as skip_batch_predicate interprets NULL as "statistics are missing" and indicates that the data must be read, whereas in hive a NULL indicates that the actual row values are all NULL.

This PR changes it to directly use the original predicate expression. It is more of a quick fix as I am expecting to do larger refactors around predicate handling in the future for iceberg.

codecov · 2025-06-04T11:32:26Z

Codecov Report

Attention: Patch coverage is 80.72289% with 16 lines in your changes missing coverage. Please review.

Project coverage is 80.22%. Comparing base (7768a67) to head (4007be8).
Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/polars-mem-engine/src/planner/lp.rs	75.92%	13 Missing ⚠️
crates/polars-mem-engine/src/predicate.rs	40.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #23074      +/-   ##
==========================================
- Coverage   80.25%   80.22%   -0.04%     
==========================================
  Files        1684     1684              
  Lines      223961   223945      -16     
  Branches     2808     2808              
==========================================
- Hits       179743   179662      -81     
- Misses      43558    43623      +65     
  Partials      660      660

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ritchie46 · 2025-06-04T12:39:30Z

Nice, can we test it somehow?

c

c7d6891

github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Jun 4, 2025

nameexhaustion changed the title ~~fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__~~ fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ (NULL) Jun 4, 2025

nameexhaustion changed the title ~~fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ (NULL)~~ fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ Jun 4, 2025

nameexhaustion marked this pull request as ready for review June 4, 2025 12:00

nameexhaustion requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli, reswqa and orlp as code owners June 4, 2025 12:00

add test

4007be8

ritchie46 merged commit 665a202 into pola-rs:main Jun 5, 2025
28 checks passed

owenam mentioned this pull request Jun 30, 2025

scan_parquet().filter() on partitioned categorical column returns data without filter applied #23113

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: Fix hive partition pruning not filtering out `__HIVE_DEFAULT_PARTITION__` #23074

fix: Fix hive partition pruning not filtering out `__HIVE_DEFAULT_PARTITION__` #23074

Uh oh!

nameexhaustion commented Jun 4, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 4, 2025 •

edited

Loading

Uh oh!

ritchie46 commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ #23074

fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ #23074

Uh oh!

Conversation

nameexhaustion commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ritchie46 commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

fix: Fix hive partition pruning not filtering out `__HIVE_DEFAULT_PARTITION__` #23074

fix: Fix hive partition pruning not filtering out `__HIVE_DEFAULT_PARTITION__` #23074

nameexhaustion commented Jun 4, 2025 •

edited

Loading

codecov bot commented Jun 4, 2025 •

edited

Loading