Skip to content

Conversation

coastalwhite
Copy link
Collaborator

@coastalwhite coastalwhite commented May 30, 2025

Fixes #17418.

This adds the required: bool field to ParquetFieldOverwrites which allows exported fields to be marked as Parquet required. This will also mark the Arrow field as not nullable.

This works for all nesting levels.

import polars as pl
from polars.io.parquet import ParquetFieldOverwrites

f = io.BytesIO()
pl.Series("a", [[1], [2, 3], [3, 4, 5], None]).to_frame().lazy().sink_parquet(
    f,
    field_overwrites=ParquetFieldOverwrites(
	name="a",
	required=False,
	children=ParquetFieldOverwrites(required=True),
    ),
)

import pyarrow.parquet as pq

f.seek(0)
schema = pq.read_schema(f)
assert schema.field(0).nullable
assert not schema.field(0).type.value_field.nullable

Fixes pola-rs#17418.

This adds the `required: bool` field to `ParquetFieldOverwrites` which allows
exported fields to be marked as Parquet `required`. This will also marked the
Arrow field as *not* `nullable`.

This works for all nesting levels.

```python
import polars as pl
from polars.io.parquet import ParquetFieldOverwrites

f = io.BytesIO()
pl.Series("a", [[1], [2, 3], [3, 4, 5], None]).to_frame().lazy().sink_parquet(
    io.BytesIO(),
    field_overwrites=ParquetFieldOverwrites(
	name="a",
	required=False,
	children=ParquetFieldOverwrites(required=True),
    ),
)

f.seek(0)

import pyarrow.parquet as pq

schema = pq.read_schema(f)
assert not schema.field(0).nullable
assert schema.field(0).type.value_field.nullable
```
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels May 30, 2025
Copy link

codecov bot commented May 30, 2025

Codecov Report

Attention: Patch coverage is 94.28571% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.36%. Comparing base (ba1293c) to head (e701b90).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-parquet/src/arrow/write/mod.rs 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #23013      +/-   ##
==========================================
- Coverage   80.38%   80.36%   -0.03%     
==========================================
  Files        1682     1682              
  Lines      223374   223403      +29     
  Branches     2803     2804       +1     
==========================================
- Hits       179562   179529      -33     
- Misses      43145    43212      +67     
+ Partials      667      662       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coastalwhite coastalwhite merged commit 78b0cdc into pola-rs:main May 30, 2025
29 checks passed
@coastalwhite coastalwhite deleted the feat/parquet-overwrites-required branch May 30, 2025 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide more control over the Parquet schema when writing to Parquet
1 participant