fix: Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__
#23074
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We currently evaluate hive predicates using a
skip_batch_predicate
that was designed for parquet statistics. This is essentially an expression that operates on aDataFrame
containing{column}_min
,{column}_max
columns.The hive partitions were converted to such a
DataFrame
, however this is not compatible asskip_batch_predicate
interprets NULL as "statistics are missing" and indicates that the data must be read, whereas in hive a NULL indicates that the actual row values are all NULL.This PR changes it to directly use the original predicate expression. It is more of a quick fix as I am expecting to do larger refactors around predicate handling in the future for iceberg.