Skip to content

Releases: pola-rs/polars

Python Polars 1.32.3

14 Aug 17:28
2468e6f
Compare
Choose a tag to compare

🚀 Performance improvements

  • Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
  • Lower top-k to streaming engine (#23979)
  • Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)

✨ Enhancements

  • Add native streaming for peaks_{min,max} (#24039)
  • IR graph arrows, monospace font, box nodes (#24021)
  • Add DataTypeExpr.default_value (#23973)
  • Lower rle to a native streaming engine node (#23929)
  • Add support for Int128 to pyo3-polars (#23959)

🐞 Bug fixes

  • Scan of multiple sources with null datatype (#24065)
  • Categorical in nested data in row encoding (#24051)
  • Missing length update in builder for pl.Array repetition (#24055)
  • Race condition in global categories init (#24045)
  • Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
  • Error when using named functions (#24041)
  • Don't encode entire CategoricalMapping when going to Arrow (#24036)
  • Fix cast on arithmetic with lit (#23941)
  • Incorrect slice-slice pushdown (#24032)
  • Dedup common cache subplan in IR graph (#24028)
  • Allow join on Decimal in in-memory engine (#24026)
  • Fix datatypes for eval.list in aggregation context (#23911)
  • Allocator capsule fallback panic (#24022)
  • Accept another zlib "magic header" file signature (#24013)
  • Fix truediv dtypes so cast in list.eval is not dropped (#23936)
  • Don't reuse cached return_dtype for expanded map expressions (#24010)
  • Cache id is not a valid dot node id (#24005)
  • Align map_elements with and without return_dtype (#24007)
  • Fix column dtype lifetime for csv_write segfault on Categorical (#23986)
  • Allow serializing LazyGroupBy.map_groups (#23964)
  • Correct allocator name in PyCapsule (#23968)
  • Mismatched types for write function for windows (#23915)
  • Fix unpivot panic when index= column not found (#23958)

📖 Documentation

  • Fix a typo in "lazy/execution" user-guide page (#23983)

🛠️ Other improvements

  • Update pyo3-polars versions (#24031)
  • Remove insert_error_function (#24023)
  • Remove cache hits, clean up in-mem prefill (#24019)
  • Use .venv instead of venv in pyo3-polars examples (#24024)
  • Fix test failing mypy (#24017)
  • Remove outdated comment (#23998)
  • Add a _plr.pyi to remove mypy issues (#23970)
  • Don't define CountStar as dyn OptimizationRule (#23976)
  • Rename atol and rtol to abs_tol and rel_tol (#23961)
  • Introduce Row{Encode,Decode} as FunctionExpr (#23933)
  • Dispatch through pl.map_batches and AnonymousColumnsUdf (#23867)

Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @borchero, @cmdlineluser, @coastalwhite, @iishutov, @jarondl, @kdn36, @orlp, @rawhuul, @ritchie46 and @stijnherfst

Python Polars 1.32.2

07 Aug 10:51
34595af
Compare
Choose a tag to compare

🐞 Bug fixes

  • Return correct python package version (#23951)

📖 Documentation

  • Add arr.len() on the website (#23944)

Thank you to all our contributors for making this release possible!
@coastalwhite and @etiennebacher

Python Polars 1.32.1

06 Aug 16:50
e5e3450
Compare
Choose a tag to compare

🚀 Performance improvements

  • Optimise BytecodeParser usage from warn_on_inefficient_map (#23809)
  • Lower extend_constant to the streaming engine (#23824)
  • Lower pl.repeat to streaming engine (#23804)
  • Remove redundant clone (#23771)

✨ Enhancements

  • Lower rle_id to a native streaming node (#23894)
  • Pass endpoint_url loaded from CredentialProviderAWS to scan/write_delta (#23812)
  • Dispatch scan_iceberg to native by default (#23912)
  • Lower unique_counts and value_counts to streaming engine (#23890)
  • Support initializing from __arrow_c_schema__ protocol in pl.Schema (#23879)
  • Better handle broken local package environment in show_versions (#23885)
  • Implement dt.days_in_month function (#23119)
  • Making Expr.rolling_*_by methods available to pl.Series (#23742)
  • Fix errors on native scan_iceberg (#23811)
  • Reinterpret binary data to fixed size numerical array (#22840)
  • Make rolling_map serializable (#23848)
  • Ensure CachingCredentialProvider returns copied credentials dict (#23817)
  • Change typing for .remote() from LazyFrameExt to LazyFrameRemote (#23825)
  • Implement repeat_by for Array and Null (#23794)
  • Add DeprecationWarning on passing physical ordering to Categorical (#23779)
  • Pre-filtered decode and row group skipping with Iceberg / Delta / scans with cast options (#23792)
  • Update BytecodeParser opcode awareness for upcoming Python 3.14 (#23782)

🐞 Bug fixes

  • Categorical namespace functions fail on Enum columns (#23925)
  • Properly set sumwise complete on filter for missing columns (#23877)
  • Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
  • Group By with filters (#23917)
  • Fix read_csv ignoring Decimal schema for header-only data (#23886)
  • Ensure collect() native Iceberg always scans latest when no snapshot_id is given (#23907)
  • Writing List(Array) columns to JSON without panic (#23875)
  • Fill Iceberg missing fields with partition values if present in metadata (#23900)
  • Create file for streaming sink even if unspawned (#23672)
  • Update cloud testing environment (#23908)
  • Parquet filtering on multiple RGs with literal predicate (#23903)
  • Incorrect datatype passed to libc::write (#23904)
  • Properly feature gate TZ_AWARE_RE usage (#23888)
  • Improve identification of "non group-key" aggregates in SQL GROUP BY queries (#23191)
  • Spawning tokio task outside reactor (#23884)
  • Correctly raise DuplicateError on asof_join with suffix="" (#23864)
  • Fix errors on native scan_iceberg (#23811)
  • Fix index out of bounds panic filtering parquet (#23850)
  • Fix error on empty range requests (#23844)
  • Fix handling of hive partitioning hive_start_idx parameter (#23843)
  • Allow encoding of pl.Enum with smaller physicals (#23829)
  • Filter sorted flag from physical in CategoricalChunked (#23827)
  • Remove accidental todo! in repeat node (#23822)
  • Make meta.pop operate on Expr only (#23808)
  • Stack overflow in DslPlan serde (#23801)
  • Clear credentials cached in Python when rebuilding object store (#23756)
  • Datetime selectors with mixed timezone info (#23774)
  • Support i128 in asof join (#23770)
  • Remove sleep for credential refresh (#23768)

📖 Documentation

  • Improve StackOverflow links in contributing guide (#23895)
  • Fix pyo3 documentation page link (#23839)
  • Document the pureness requirements of udfs (#23787)
  • Correct the name.* methods on their removal of aliases (#23773)

📦 Build system

  • Workaround for pyiceberg make requirements on Python 3.13 (#23810)
  • Add pyiceberg to dev dependencies (#23791)

🛠️ Other improvements

  • Ensure clippy and rustfmt run in CI when changing pyo3-polars (#23930)
  • Fix pyo3-polars proc-macro re-exports (#23918)
  • Rewrite evaluate_on_groups for .gather / .get (#23700)
  • Move Python C API to python-polars (#23876)
  • Improve/fix internal LRUCache implementation and move into "_utils" module (#23813)
  • Relax constraint on maximum Python version for numba (#23838)
  • Automatically tag PRs mentioning "SQL" with the appropriate label (#23816)
  • Update typos package (#23818)
  • Fix typos path (#23803)
  • Remove deserialize_with_unknown_fields (#23802)
  • Add pyiceberg to dev dependencies (#23791)
  • Remove old schema file (#23798)
  • Mark more tests as ready for cloud (#23743)
  • Reduce required deps for pyo3-polars (#23761)

Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @Liyixin95, @alexander-beedie, @cgevans, @cmdlineluser, @coastalwhite, @eitsupi, @gfvioli, @itamarst, @jimmmmmmmmmmmy, @kdn36, @math-hiyoko, @mcrumiller, @mpasa, @mrkn, @nameexhaustion, @orlp, @pka, @pomo-mondreganto, @ritchie46 and @stijnherfst

Rust Polars 0.50.0

01 Aug 12:19
0478b35
Compare
Choose a tag to compare

🏆 Highlights

  • Make Selector a concrete part of the DSL (#23351)
  • Rework Categorical/Enum to use (Frozen)Categories (#23016)

🚀 Performance improvements

  • Lower Expr.slice to streaming engine (#23683)
  • Elide bound check (#23653)
  • Preserve Column repr in ColumnTransform operations (#23648)
  • Lower any() and all() to streaming engine (#23640)
  • Lower row-separable functions in streaming engine (#23633)
  • Lower int_range(len()) to with_row_index (#23576)
  • Avoid double field resolution in with_columns (#23530)
  • Rolling quantile lower time complexity (#23443)
  • Use single-key optimization with Categorical (#23436)
  • Improve null-preserving identification for boolean functions (#23317)
  • Improve boolean bitwise aggregate performance (#23325)
  • Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
  • Re-write join types during filter pushdown (#23275)
  • Generate PQ ZSTD decompression context once (#23200)
  • Trigger cache/cse optimizations when multiplexing (#23274)
  • Cache FileInfo upon DSL -> IR conversion (#23263)
  • Push more filters past joins (#23240)

✨ Enhancements

  • Expand on DataTypeExpr (#23249)
  • Lower row-separable functions in streaming engine (#23633)
  • Add scalar checks to range expressions (#23632)
  • Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
  • Implement mean function in arr namespace (#23486)
  • Implement vec_hash for List and Array (#23578)
  • Add unstable pl.row_index() expression (#23556)
  • Add Categories on the Python side (#23543)
  • Implement partitioned sinks for the in-memory engine (#23522)
  • Expose IRFunctionExpr::Rank in the python visitor (#23512)
  • Raise and Warn on UDF's without return_dtype set (#23353)
  • IR pruning (#23499)
  • Expose IRFunctionExpr::FillNullWithStrategy in the python visitor (#23479)
  • Support min/max reducer for null dtype in streaming engine (#23465)
  • Implement streaming Categorical/Enum min/max (#23440)
  • Allow cast to Categorical inside list.eval (#23432)
  • Support pathlib.Path as source for read/scan_delta() (#23411)
  • Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
  • Pass payload in ExprRegistry (#23412)
  • Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
  • Support row group skipping with filters when cast_options is given (#23356)
  • Execute bitwise reductions in streaming engine (#23321)
  • Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
  • Add dtype to str.to_integer() (#22239)
  • Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
  • Add is_close method (#23273)
  • Drop superfluous casts from optimized plan (#23269)
  • Added drop_nulls option to to_dummies (#23215)
  • Support comma as decimal separator for CSV write (#23238)
  • Don't format keys if they're empty in dot (#23247)
  • Improve arity simplification (#23242)

🐞 Bug fixes

  • Fix credential refresh logic (#23730)
  • Fix to_datetime() fallible identification (#23735)
  • Correct output datatype for dt.with_time_unit (#23734)
  • Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
  • Allow DataType expressions with selectors (#23720)
  • Match output type to engine for interpolate on Decimal (#23706)
  • Remaining bugs in with_exprs_and_input and pruning (#23710)
  • Match output dtype to engine for cum_sum_horizontal (#23686)
  • Field names for pl.struct in group-by (#23703)
  • Fix output for str.extract_groups with empty string pattern (#23698)
  • Match output type to engine for rolling_map (#23702)
  • Fix incorrect join on single Int128 column for in-memory engine (#23694)
  • Match output field name to lhs for BusinessDaycount (#23679)
  • Correct the planner output datatype for strptime (#23676)
  • Sort and Scan with_exprs_and_input (#23675)
  • Revert to old behavior with name.keep (#23670)
  • Fix panic loading from arrow Map containing timestamps (#23662)
  • Selectors in self part of list.eval (#23668)
  • Fix output field dtype for ToInteger (#23664)
  • Allow decimal_comma with , separator in read_csv (#23657)
  • Fix handling of UTF-8 in write_csv to IO[str] (#23647)
  • Selectors in {Lazy,Data}Frame.filter (#23631)
  • Stop splitfields iterator at eol in simd branch (#23652)
  • Correct output datatype of dt.year and dt.mil (#23646)
  • Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
  • Order-preserving equi-join didn't always flush final matches (#23639)
  • Fix ColumnNotFound error when joining on col().cast() (#23622)
  • Fix agg groups on when/then in group_by context (#23628)
  • Output type for sign (#23572)
  • Apply agg_fn on null values in pivot (#23586)
  • Remove nonsensical duration variance (#23621)
  • Don't panic when sinking nested categorical to Parquet (#23610)
  • Correctly set value count output field name (#23611)
  • Casting unused columns in to_torch (#23606)
  • Allow inferring of hours-only timezone offset (#23605)
  • Bug in Categorical <-> str compare with nulls (#23609)
  • Honor n=0 in all cases of str.replace (#23598)
  • Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
  • Relabel duplicate sequence IDs in distributor (#23593)
  • Round-trip Enum and Categorical metadata in plugins (#23588)
  • Fix incorrect join_asof with by followed by head/slice (#23585)
  • Allow writing nested Int128 data to Parquet (#23580)
  • Enum serialization assert (#23574)
  • Output type for peak_min / peak_max (#23573)
  • Make Scalar Categorical, Enum and Struct values serializable (#23565)
  • Preserve row order within partition when sinking parquet (#23462)
  • Panic in create_multiple_physical_plans when branching from a single cache node (#23561)
  • Prevent in-mem partition sink deadlock (#23562)
  • Update AWS cloud documentation (#23563)
  • Correctly handle null values when comparing structs (#23560)
  • Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
  • Make Expr.append serializable (#23515)
  • Float by float division dtype (#23529)
  • Division on empty DataFrame generating null row (#23516)
  • Partition sink copy_exprs and with_exprs_and_input (#23511)
  • Unreachable with pl.self_dtype (#23507)
  • Rolling median incorrect min_samples with nulls (#23481)
  • Make Int128 roundtrippable via Parquet (#23494)
  • Fix panic when common subplans contain IEJoins (#23487)
  • Properly handle non-finite floats in rolling_sum/mean (#23482)
  • Make read_csv_batched respect skip_rows and skip_lines (#23484)
  • Always use cloudpickle for the python objects in cloud plans (#23474)
  • Support string literals in index_of() on categoricals (#23458)
  • Don't panic for finish_callback with nested datatypes (#23464)
  • Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
  • Fix var/moment dtypes (#23453)
  • Fix agg_groups dtype (#23450)
  • Clear cached_schema when apply changes dtype (#23439)
  • Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
  • Null handling in full-null group_by_dynamic mean/sum (#23435)
  • Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
  • Fix index calculation for nearest interpolation (#23418)
  • Fix compilation failure with --no-default-features and --features lazy,strings (#23384)
  • Parse parquet footer length into unsigned integer (#23357)
  • Fix incorrect results with group_by aggregation on empty groups (#23358)
  • Fix boolean min() in group_by aggregation (streaming) (#23344)
  • Respect data-model in map_elements (#23340)
  • Properly join URI paths in PlPath (#23350)
  • Ignore null values in bitwise aggregation on bools (#23324)
  • Fix panic filtering after left join (#23310)
  • Out-of-bounds index in hot hash table (#23311)
  • Fix scanning '?' from cloud with glob=False (#23304)
  • Fix filters on inserted columns did not remove rows (#23303)
  • Don't ignore return_dtype (#23309)
  • Use safe parsing for get_normal_components (#23284)
  • Fix output column names/order of streaming coalesced right-join (#23278)
  • Restore concat_arr inputs expansion (#23271)

📖 Documentation

  • Point the R Polars version on R-multiverse (#23660)
  • Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
  • Add page about billing to Polars Cloud user guide (#23564)
  • Small user-guide improvement and fixes (#23549)
  • Correct note in from_pandas about data being cloned (#23552)
  • Fix a few typos in the "Streaming" section (#23536)
  • Update streaming page (#23535)
  • Update structure of Polars Cloud documentation (#23496)
  • Update when_then in user guide (#23245)

📦 Build system

  • Update all rand code (#23387)
  • Bump up rand & rand_distr (#22619)

🛠️ Other improvements

  • Remove incorrect DeletionFilesList::slice (#23796)
  • Remove old schema file (#23798)
  • Remove Default for StreamingExecutionState (#23729)
  • Explicit match to smaller dtypes before cast to Int32 in asof join (#23776)
  • Expose PlPathRef via polars::prelude (#23754)
  • Add hashes json (#23758)
  • Add AExpr::is_expr_equal_to (#23740)
  • Fix rank test to respect maintain order (#23723)
  • IR inputs and exprs iterators (#23722)
  • Store more granular schema hashes to reduce merge conflicts (#23709)
  • Add assertions for unique ID (#23711)
  • Use RelaxedCell in multiscan (#23712)
  • Debug assert ColumnTransform cast is non-strict (#23717)
  • Use UUID for UniqueID (#23704)
  • Remove scan id (#23697)
  • Propagate Iceberg physical ID schema to IR (#23671)
  • Remove unused and confusing match arm (#23691)
  • Remove unused ALLOW_GROUP_AWARE flag (#23690)
  • Remove unused evaluate_inline (#23687)
  • Remove unused field from AggregationContext (#23685)
  • Remove `nod...
Read more

Python Polars 1.32.0

01 Aug 01:43
c57de4b
Compare
Choose a tag to compare

🏆 Highlights

  • Make Selector a concrete part of the DSL (#23351)
  • Rework Categorical/Enum to use (Frozen)Categories (#23016)

🚀 Performance improvements

  • Lower Expr.slice to streaming engine (#23683)
  • Elide bound check (#23653)
  • Preserve Column repr in ColumnTransform operations (#23648)
  • Lower any() and all() to streaming engine (#23640)
  • Lower row-separable functions in streaming engine (#23633)
  • Lower int_range(len()) to with_row_index (#23576)
  • Avoid double field resolution in with_columns (#23530)
  • Rolling quantile lower time complexity (#23443)
  • Use single-key optimization with Categorical (#23436)
  • Improve null-preserving identification for boolean functions (#23317)
  • Improve boolean bitwise aggregate performance (#23325)
  • Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
  • Re-write join types during filter pushdown (#23275)
  • Generate PQ ZSTD decompression context once (#23200)
  • Trigger cache/cse optimizations when multiplexing (#23274)
  • Cache FileInfo upon DSL -> IR conversion (#23263)
  • Push more filters past joins (#23240)
  • Optimize Bitmap::make_mut (#23138)

✨ Enhancements

  • Add Python-side caching for credentials and provider auto-initialization (#23736)
  • Expand on DataTypeExpr (#23249)
  • Lower row-separable functions in streaming engine (#23633)
  • Add scalar checks to range expressions (#23632)
  • Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
  • Implement mean function in arr namespace (#23486)
  • Implement vec_hash for List and Array (#23578)
  • Add unstable pl.row_index() expression (#23556)
  • Add Categories on the Python side (#23543)
  • Implement partitioned sinks for the in-memory engine (#23522)
  • Raise and Warn on UDF's without return_dtype set (#23353)
  • IR pruning (#23499)
  • Support min/max reducer for null dtype in streaming engine (#23465)
  • Implement streaming Categorical/Enum min/max (#23440)
  • Allow cast to Categorical inside list.eval (#23432)
  • Support pathlib.Path as source for read/scan_delta() (#23411)
  • Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
  • Pass payload in ExprRegistry (#23412)
  • Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
  • Support row group skipping with filters when cast_options is given (#23356)
  • Execute bitwise reductions in streaming engine (#23321)
  • Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
  • Add dtype to str.to_integer() (#22239)
  • Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
  • Add is_close method (#23273)
  • Drop superfluous casts from optimized plan (#23269)
  • Added drop_nulls option to to_dummies (#23215)
  • Support comma as decimal separator for CSV write (#23238)
  • Don't format keys if they're empty in dot (#23247)
  • Improve arity simplification (#23242)
  • Allow expression input for length parameter in pad_start, pad_end, and zfill (#23182)

🐞 Bug fixes

  • Load _expiry_time from botocore Credentials in CredentialProviderAWS (#23753)
  • Fix credential refresh logic (#23730)
  • Fix to_datetime() fallible identification (#23735)
  • Correct output datatype for dt.with_time_unit (#23734)
  • Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
  • Allow DataType expressions with selectors (#23720)
  • Match output type to engine for interpolate on Decimal (#23706)
  • Remaining bugs in with_exprs_and_input and pruning (#23710)
  • Match output dtype to engine for cum_sum_horizontal (#23686)
  • Field names for pl.struct in group-by (#23703)
  • Fix output for str.extract_groups with empty string pattern (#23698)
  • Match output type to engine for rolling_map (#23702)
  • Moved passing DeltaTable._storage_options (#23673)
  • Fix incorrect join on single Int128 column for in-memory engine (#23694)
  • Match output field name to lhs for BusinessDaycount (#23679)
  • Correct the planner output datatype for strptime (#23676)
  • Sort and Scan with_exprs_and_input (#23675)
  • Revert to old behavior with name.keep (#23670)
  • Fix panic loading from arrow Map containing timestamps (#23662)
  • Selectors in self part of list.eval (#23668)
  • Fix output field dtype for ToInteger (#23664)
  • Allow decimal_comma with , separator in read_csv (#23657)
  • Fix handling of UTF-8 in write_csv to IO[str] (#23647)
  • Selectors in {Lazy,Data}Frame.filter (#23631)
  • Stop splitfields iterator at eol in simd branch (#23652)
  • Correct output datatype of dt.year and dt.mil (#23646)
  • Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
  • Order-preserving equi-join didn't always flush final matches (#23639)
  • Fix ColumnNotFound error when joining on col().cast() (#23622)
  • Fix agg groups on when/then in group_by context (#23628)
  • Output type for sign (#23572)
  • Apply agg_fn on null values in pivot (#23586)
  • Remove nonsensical duration variance (#23621)
  • Don't panic when sinking nested categorical to Parquet (#23610)
  • Correctly set value count output field name (#23611)
  • Casting unused columns in to_torch (#23606)
  • Allow inferring of hours-only timezone offset (#23605)
  • Bug in Categorical <-> str compare with nulls (#23609)
  • Honor n=0 in all cases of str.replace (#23598)
  • Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
  • Relabel duplicate sequence IDs in distributor (#23593)
  • Round-trip Enum and Categorical metadata in plugins (#23588)
  • Fix incorrect join_asof with by followed by head/slice (#23585)
  • Change return typing of get_index_type() from DataType to PolarsIntegerType (#23558)
  • Allow writing nested Int128 data to Parquet (#23580)
  • Enum serialization assert (#23574)
  • Output type for peak_min / peak_max (#23573)
  • Make Scalar Categorical, Enum and Struct values serializable (#23565)
  • Preserve row order within partition when sinking parquet (#23462)
  • Prevent in-mem partition sink deadlock (#23562)
  • Update AWS cloud documentation (#23563)
  • Correctly handle null values when comparing structs (#23560)
  • Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
  • Make Expr.append serializable (#23515)
  • Float by float division dtype (#23529)
  • Division on empty DataFrame generating null row (#23516)
  • Partition sink copy_exprs and with_exprs_and_input (#23511)
  • Unreachable with pl.self_dtype (#23507)
  • Rolling median incorrect min_samples with nulls (#23481)
  • Make Int128 roundtrippable via Parquet (#23494)
  • Fix panic when common subplans contain IEJoins (#23487)
  • Properly handle non-finite floats in rolling_sum/mean (#23482)
  • Make read_csv_batched respect skip_rows and skip_lines (#23484)
  • Always use cloudpickle for the python objects in cloud plans (#23474)
  • Support string literals in index_of() on categoricals (#23458)
  • Don't panic for finish_callback with nested datatypes (#23464)
  • Pass DeltaTable._storage_options if no storage_options are provided (#23456)
  • Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
  • Fix var/moment dtypes (#23453)
  • Fix agg_groups dtype (#23450)
  • Fix incorrect _get_path_scheme (#23444)
  • Fix missing overload defaults in read_ods and tree_format (#23442)
  • Clear cached_schema when apply changes dtype (#23439)
  • Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
  • Null handling in full-null group_by_dynamic mean/sum (#23435)
  • Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
  • Fix index calculation for nearest interpolation (#23418)
  • Overload for eager default in Schema.to_frame was False instead of True (#23413)
  • Fix read_excel overloads so that passing list[str] to sheet_name does not raise (#23388)
  • Removed special handling for bytes like objects in read_ndjson (#23361)
  • Parse parquet footer length into unsigned integer (#23357)
  • Fix incorrect results with group_by aggregation on empty groups (#23358)
  • Fix boolean min() in group_by aggregation (streaming) (#23344)
  • Respect data-model in map_elements (#23340)
  • Properly join URI paths in PlPath (#23350)
  • Ignore null values in bitwise aggregation on bools (#23324)
  • Fix panic filtering after left join (#23310)
  • Out-of-bounds index in hot hash table (#23311)
  • Fix scanning '?' from cloud with glob=False (#23304)
  • Fix filters on inserted columns did not remove rows (#23303)
  • Don't ignore return_dtype (#23309)
  • Raise error instead of return in Series class (#23301)
  • Use safe parsing for get_normal_components (#23284)
  • Fix output column names/order of streaming coalesced right-join (#23278)
  • Restore concat_arr inputs expansion (#23271)
  • Expose FieldsMapper (#23232)
  • Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)

📖 Documentation

  • Fix str.replace_many examples trigger deprecation warning (#23695)
  • Point the R Polars version on R-multiverse (#23660)
  • Update example for writing to cloud storage (#20265)
  • Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
  • Add docs of Expr.list.filter and Series.list.filter (#23589)
  • Add page about billing to Polars Cloud user guide (#23564)
  • Small user-guide improvement and fixes (#23549)
  • Correct note in from_pandas about data being cloned (#23552)
  • Fix a few typos in the "Streaming" section (#23536)
  • Update streaming page (#23535)
  • Update structure of Polars Cloud documentation (#23496)
  • Update example code in pandas migration guide (#23403)
  • Correct plugins user guide to reflect that teaching Expr.language is in a different section (#23377)
  • Add example of using OR in join_where (#23375)
  • Update when_then in user guide (#23245)

📦 Build system

  • Update all rand code (#23387)

🛠️ Other improvements

  • Remove unused functions from the rust side (#2...
Read more

Python Polars 1.32.0-beta.1

26 Jul 19:44
a7081b6
Compare
Choose a tag to compare
Pre-release

🏆 Highlights

  • Make Selector a concrete part of the DSL (#23351)
  • Rework Categorical/Enum to use (Frozen)Categories (#23016)

🚀 Performance improvements

  • Lower Expr.slice to streaming engine (#23683)
  • Elide bound check (#23653)
  • Preserve Column repr in ColumnTransform operations (#23648)
  • Lower any() and all() to streaming engine (#23640)
  • Lower row-separable functions in streaming engine (#23633)
  • Lower int_range(len()) to with_row_index (#23576)
  • Avoid double field resolution in with_columns (#23530)
  • Rolling quantile lower time complexity (#23443)
  • Use single-key optimization with Categorical (#23436)
  • Improve null-preserving identification for boolean functions (#23317)
  • Improve boolean bitwise aggregate performance (#23325)
  • Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
  • Re-write join types during filter pushdown (#23275)
  • Generate PQ ZSTD decompression context once (#23200)
  • Trigger cache/cse optimizations when multiplexing (#23274)
  • Cache FileInfo upon DSL -> IR conversion (#23263)
  • Push more filters past joins (#23240)
  • Optimize Bitmap::make_mut (#23138)

✨ Enhancements

  • Add Python-side caching for credentials and provider auto-initialization (#23736)
  • Expand on DataTypeExpr (#23249)
  • Lower row-separable functions in streaming engine (#23633)
  • Add scalar checks to range expressions (#23632)
  • Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
  • Implement mean function in arr namespace (#23486)
  • Implement vec_hash for List and Array (#23578)
  • Add unstable pl.row_index() expression (#23556)
  • Add Categories on the Python side (#23543)
  • Implement partitioned sinks for the in-memory engine (#23522)
  • Raise and Warn on UDF's without return_dtype set (#23353)
  • IR pruning (#23499)
  • Support min/max reducer for null dtype in streaming engine (#23465)
  • Implement streaming Categorical/Enum min/max (#23440)
  • Allow cast to Categorical inside list.eval (#23432)
  • Support pathlib.Path as source for read/scan_delta() (#23411)
  • Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
  • Pass payload in ExprRegistry (#23412)
  • Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
  • Support row group skipping with filters when cast_options is given (#23356)
  • Execute bitwise reductions in streaming engine (#23321)
  • Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
  • Add dtype to str.to_integer() (#22239)
  • Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
  • Add is_close method (#23273)
  • Drop superfluous casts from optimized plan (#23269)
  • Added drop_nulls option to to_dummies (#23215)
  • Support comma as decimal separator for CSV write (#23238)
  • Don't format keys if they're empty in dot (#23247)
  • Improve arity simplification (#23242)
  • Allow expression input for length parameter in pad_start, pad_end, and zfill (#23182)

🐞 Bug fixes

  • Load _expiry_time from botocore Credentials in CredentialProviderAWS (#23753)
  • Fix credential refresh logic (#23730)
  • Fix to_datetime() fallible identification (#23735)
  • Correct output datatype for dt.with_time_unit (#23734)
  • Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
  • Allow DataType expressions with selectors (#23720)
  • Match output type to engine for interpolate on Decimal (#23706)
  • Remaining bugs in with_exprs_and_input and pruning (#23710)
  • Match output dtype to engine for cum_sum_horizontal (#23686)
  • Field names for pl.struct in group-by (#23703)
  • Fix output for str.extract_groups with empty string pattern (#23698)
  • Match output type to engine for rolling_map (#23702)
  • Moved passing DeltaTable._storage_options (#23673)
  • Fix incorrect join on single Int128 column for in-memory engine (#23694)
  • Match output field name to lhs for BusinessDaycount (#23679)
  • Correct the planner output datatype for strptime (#23676)
  • Sort and Scan with_exprs_and_input (#23675)
  • Revert to old behavior with name.keep (#23670)
  • Fix panic loading from arrow Map containing timestamps (#23662)
  • Selectors in self part of list.eval (#23668)
  • Fix output field dtype for ToInteger (#23664)
  • Allow decimal_comma with , separator in read_csv (#23657)
  • Fix handling of UTF-8 in write_csv to IO[str] (#23647)
  • Selectors in {Lazy,Data}Frame.filter (#23631)
  • Stop splitfields iterator at eol in simd branch (#23652)
  • Correct output datatype of dt.year and dt.mil (#23646)
  • Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
  • Order-preserving equi-join didn't always flush final matches (#23639)
  • Fix ColumnNotFound error when joining on col().cast() (#23622)
  • Fix agg groups on when/then in group_by context (#23628)
  • Output type for sign (#23572)
  • Apply agg_fn on null values in pivot (#23586)
  • Remove nonsensical duration variance (#23621)
  • Don't panic when sinking nested categorical to Parquet (#23610)
  • Correctly set value count output field name (#23611)
  • Casting unused columns in to_torch (#23606)
  • Allow inferring of hours-only timezone offset (#23605)
  • Bug in Categorical <-> str compare with nulls (#23609)
  • Honor n=0 in all cases of str.replace (#23598)
  • Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
  • Relabel duplicate sequence IDs in distributor (#23593)
  • Round-trip Enum and Categorical metadata in plugins (#23588)
  • Fix incorrect join_asof with by followed by head/slice (#23585)
  • Change return typing of get_index_type() from DataType to PolarsIntegerType (#23558)
  • Allow writing nested Int128 data to Parquet (#23580)
  • Enum serialization assert (#23574)
  • Output type for peak_min / peak_max (#23573)
  • Make Scalar Categorical, Enum and Struct values serializable (#23565)
  • Preserve row order within partition when sinking parquet (#23462)
  • Prevent in-mem partition sink deadlock (#23562)
  • Update AWS cloud documentation (#23563)
  • Correctly handle null values when comparing structs (#23560)
  • Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
  • Make Expr.append serializable (#23515)
  • Float by float division dtype (#23529)
  • Division on empty DataFrame generating null row (#23516)
  • Partition sink copy_exprs and with_exprs_and_input (#23511)
  • Unreachable with pl.self_dtype (#23507)
  • Rolling median incorrect min_samples with nulls (#23481)
  • Make Int128 roundtrippable via Parquet (#23494)
  • Fix panic when common subplans contain IEJoins (#23487)
  • Properly handle non-finite floats in rolling_sum/mean (#23482)
  • Make read_csv_batched respect skip_rows and skip_lines (#23484)
  • Always use cloudpickle for the python objects in cloud plans (#23474)
  • Support string literals in index_of() on categoricals (#23458)
  • Don't panic for finish_callback with nested datatypes (#23464)
  • Pass DeltaTable._storage_options if no storage_options are provided (#23456)
  • Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
  • Fix var/moment dtypes (#23453)
  • Fix agg_groups dtype (#23450)
  • Fix incorrect _get_path_scheme (#23444)
  • Fix missing overload defaults in read_ods and tree_format (#23442)
  • Clear cached_schema when apply changes dtype (#23439)
  • Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
  • Null handling in full-null group_by_dynamic mean/sum (#23435)
  • Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
  • Fix index calculation for nearest interpolation (#23418)
  • Overload for eager default in Schema.to_frame was False instead of True (#23413)
  • Fix read_excel overloads so that passing list[str] to sheet_name does not raise (#23388)
  • Removed special handling for bytes like objects in read_ndjson (#23361)
  • Parse parquet footer length into unsigned integer (#23357)
  • Fix incorrect results with group_by aggregation on empty groups (#23358)
  • Fix boolean min() in group_by aggregation (streaming) (#23344)
  • Respect data-model in map_elements (#23340)
  • Properly join URI paths in PlPath (#23350)
  • Ignore null values in bitwise aggregation on bools (#23324)
  • Fix panic filtering after left join (#23310)
  • Out-of-bounds index in hot hash table (#23311)
  • Fix scanning '?' from cloud with glob=False (#23304)
  • Fix filters on inserted columns did not remove rows (#23303)
  • Don't ignore return_dtype (#23309)
  • Raise error instead of return in Series class (#23301)
  • Use safe parsing for get_normal_components (#23284)
  • Fix output column names/order of streaming coalesced right-join (#23278)
  • Restore concat_arr inputs expansion (#23271)
  • Expose FieldsMapper (#23232)
  • Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)

📖 Documentation

  • Fix str.replace_many examples trigger deprecation warning (#23695)
  • Point the R Polars version on R-multiverse (#23660)
  • Update example for writing to cloud storage (#20265)
  • Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
  • Add docs of Expr.list.filter and Series.list.filter (#23589)
  • Add page about billing to Polars Cloud user guide (#23564)
  • Small user-guide improvement and fixes (#23549)
  • Correct note in from_pandas about data being cloned (#23552)
  • Fix a few typos in the "Streaming" section (#23536)
  • Update streaming page (#23535)
  • Update structure of Polars Cloud documentation (#23496)
  • Update example code in pandas migration guide (#23403)
  • Correct plugins user guide to reflect that teaching Expr.language is in a different section (#23377)
  • Add example of using OR in join_where (#23375)
  • Update when_then in user guide (#23245)

📦 Build system

  • Update all rand code (#23387)

🛠️ Other improvements

  • Add hashes json (#23758)
  • Add `AExpr::is_expr...
Read more

Rust Polars 0.49.1

30 Jun 14:42
99e94c9
Compare
Choose a tag to compare

🚀 Performance improvements

  • Optimize Bitmap::make_mut (#23138)

✨ Enhancements

  • Allow expression input for length parameter in pad_start, pad_end, and zfill (#23182)

🐞 Bug fixes

  • Expose FieldsMapper (#23232)
  • Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)

📖 Documentation

  • Update when_then in user guide (#23245)

🛠️ Other improvements

  • Connect Python assert_dataframe_equal() to Rust back-end (#23207)
  • Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)
  • Update Rust Polars versions (#23229)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @mcrumiller, @mrkn, @stijnherfst and @zyctree

Rust Polars 0.49.0

30 Jun 14:22
3e35098
Compare
Choose a tag to compare

💥 Breaking changes

  • Remove old streaming engine (#23103)

🚀 Performance improvements

  • Improve streaming groupby CSE (#23092)
  • Move row index materialization in post-apply to occur after slicing (#22995)
  • Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
  • Don't go through row encoding for most types on index_of (#22903)
  • Optimise low-level null scans and arg_max for bools (when chunked) (#22897)
  • Optimize multiscan performance (#22886)

✨ Enhancements

  • Native implementation for Iceberg positional deletes (#23091)
  • Remove old streaming engine (#23103)
  • Make match_chunks public (#23101)
  • Implement StructFunction expressions in into_py (#23022)
  • Basic implementation of DataTypeExpr in Rust DSL (#23049)
  • Add required: bool to ParquetFieldOverwrites (#23013)
  • Support serializing name.map_fields (#22997)
  • Support serializing Expr::RenameAlias (#22988)
  • Remove duplicate verbose logging from FetchedCredentialsCache (#22973)
  • Add keys column in finish_callback (#22968)
  • Add extra_columns parameter to scan_parquet (#22699)
  • Add CORR function to polars SQL (#22690)
  • Add per partition sort and finish callback to sinks (#22789)
  • Add and test DataFrame equality functionality (#22865)
  • Support descendingly-sorted values in search_sorted() (#22825)
  • Derive DSL schema (#22866)

🐞 Bug fixes

  • Restrict custom aggregate_function in pivot to pl.element() (#23155)
  • Don't leak SourceToken in in-memory sink linearize (#23201)
  • Fix panic reading empty parquet with multiple boolean columns (#23159)
  • Raise ComputeError instead of panicking in truncate when mixing month/week/day/sub-daily units (#23176)
  • Materialize list.eval with unknown type (#23186)
  • Only set sorting flag for 1st column with PQ SortingColumns (#23184)
  • Typo in AExprBuilder (#23171)
  • Null return from var/std on scalar column (#23158)
  • Support Datetime broadcast in list.concat (#23137)
  • Ensure projection pushdown maintains right table schema (#22603)
  • Don't create i128 scalars if dtype-128 is not set (#23118)
  • Add Null dtype support to arg_sort_by (#23107)
  • Raise error by default on invalid CSV quotes (#22876)
  • Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
  • Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ (#23074)
  • Fix AssertionError when using scan_delta() on AWS with storage_options (#23076)
  • Fix deadlock on collect(background=True) / collect_concurrently() (#23075)
  • Incorrect null count in rolling_min/max (#23073)
  • Preserve file:// in LazyFrame node traverser (#23072)
  • Respect column order in register_io_source schema (#23057)
  • Incorrect output when using sort with group_by and cum_sum (#23001)
  • Implement owned arithmetic for Int128 (#23055)
  • Do not schema-match structs with different field counts (#23018)
  • Fix confusing error message on duplicate row_index (#23043)
  • Add include_nulls to Agg::Count CSE check (#23032)
  • View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
  • Fix incorrect size_hint() for FlatIter (#23010)
  • Fix incorrect result selecting pl.len() from scan_csv with skip_lines (#22949)
  • Allow for IO plugins with reordered columns in streaming (#22987)
  • Method str.zfill was inconsistent with Python and pandas when string contained leading '+' (#22985)
  • Integer underflow in propagate_nulls (#22986)
  • Fix cum_min and cum_max does not preserve inf or -inf values at series start (#22896)
  • Setting compat_level=0 for sink_ipc (#22960)
  • Support arrow Decimal32 and Decimal64 types (#22954)
  • Update arrow format (#22941)
  • Fix filter pushdown to IO plugins (#22910)
  • Improve numeric stability rolling_mean<f32> (#22944)
  • Allow subclasses in type equality checking (#22915)
  • Return early in pl.Expr.__array_ufunc__ when only single input (#22913)
  • Add inline implodes in type coercion (#22885)
  • Correct int_ranges to raise error on invalid inputs (#22894)
  • Set the sorted flag on Array after it is sorted (#22822)
  • Don't silently overflow for temporal casts (#22901)
  • Fix error using write_csv with storage_options (#22881)
  • Schema resolution .over(mapping_strategy="join") with non-aggregations (#22875)
  • Ensure rename behaves the same as select (#22852)

📖 Documentation

  • Update when_then in user guide (#23245)
  • Minor improvement to cum_count docstring example (#23099)
  • Add missing entry for LazyFrame __getitem__ (#22924)

📦 Build system

  • Actually disable ir_serde by default (#23046)
  • Add a feature flag for serde_ignored (#22957)
  • Fix warnings, update DSL version and schema hash (#22953)

🛠️ Other improvements

  • Update Rust Polars versions (#23229)
  • Change flake to use venv (#23219)
  • Add default_alloc feature to py-polars (#23202)
  • Added more descriptive error message by replacing FixedSizeList with Array (#23168)
  • Connect Python assert_series_equal() to Rust back-end (#23141)
  • Refactor skip_batches to use AExprBuilder (#23147)
  • Use ir_serde instead of serde for IRFunctionExpr (#23148)
  • Separate FunctionExpr and IRFunctionExpr (#23140)
  • Improve Series equality functionality and prepare for Python integration (#23136)
  • Add PolarsPhysicalType and use it to dispatch into_series (#23080)
  • Remove AExpr::Alias (#23070)
  • Add components for Iceberg deletion file support (#23059)
  • Feature gate StructFunction::JsonEncode (#23060)
  • Propagate iceberg position delete information to IR (#23045)
  • Add environment variable to get Parquet decoding metrics (#23052)
  • Turn pl.cumulative_eval into its own AExpr (#22994)
  • Add make test-streaming (#23044)
  • Move scan parameter parsing for parquet to reusable function (#23019)
  • Use a ref-counted UniqueId instead of usize for cache_id (#22984)
  • Implement Hash and use SpecialEq for RenameAliasFn (#22989)
  • Turn list.eval into an AExpr (#22911)
  • Only check for unknown DSL fields if minor is higher (#22970)
  • Don't enable ir_serde together with serde (#22969)
  • Make dtype field on Logical non-optional (#22966)
  • Add new (Frozen)Categories and CategoricalMapping (#22956)
  • Add a CI check for DSL schema changes (#22898)
  • Add schema parameters to expr.meta (#22906)
  • Update rust toolchain in nix flake (#22905)
  • Update toolchain (#22859)

Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @math-hiyoko, @mcrumiller, @mrkn, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst, @thomasfrederikhoeck and @zyctree

Python Polars 1.31.0

18 Jun 12:01
6e02c20
Compare
Choose a tag to compare

💥 Breaking changes

  • Remove old streaming engine (#23103)

⚠️ Deprecations

  • Deprecate allow_missing_columns in scan_parquet in favor of missing_columns (#22784)

🚀 Performance improvements

  • Improve streaming groupby CSE (#23092)
  • Move row index materialization in post-apply to occur after slicing (#22995)
  • Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
  • Don't go through row encoding for most types on index_of (#22903)
  • Optimise low-level null scans and arg_max for bools (when chunked) (#22897)
  • Optimize multiscan performance (#22886)

✨ Enhancements

  • DataType expressions in Python (#23167)
  • Native implementation for Iceberg positional deletes (#23091)
  • Remove old streaming engine (#23103)
  • Basic implementation of DataTypeExpr in Rust DSL (#23049)
  • Add required: bool to ParquetFieldOverwrites (#23013)
  • Support serializing name.map_fields (#22997)
  • Support serializing Expr::RenameAlias (#22988)
  • Remove duplicate verbose logging from FetchedCredentialsCache (#22973)
  • Add keys column in finish_callback (#22968)
  • Add extra_columns parameter to scan_parquet (#22699)
  • Add CORR function to polars SQL (#22690)
  • Add per partition sort and finish callback to sinks (#22789)
  • Support descendingly-sorted values in search_sorted() (#22825)
  • Derive DSL schema (#22866)

🐞 Bug fixes

  • Remove axis in show_graph (#23218)
  • Remove axis ticks in show_graph (#23210)
  • Restrict custom aggregate_function in pivot to pl.element() (#23155)
  • Don't leak SourceToken in in-memory sink linearize (#23201)
  • Fix panic reading empty parquet with multiple boolean columns (#23159)
  • Raise ComputeError instead of panicking in truncate when mixing month/week/day/sub-daily units (#23176)
  • Materialize list.eval with unknown type (#23186)
  • Only set sorting flag for 1st column with PQ SortingColumns (#23184)
  • Typo in AExprBuilder (#23171)
  • Null return from var/std on scalar column (#23158)
  • Support Datetime broadcast in list.concat (#23137)
  • Ensure projection pushdown maintains right table schema (#22603)
  • Add Null dtype support to arg_sort_by (#23107)
  • Raise error by default on invalid CSV quotes (#22876)
  • Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
  • Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ (#23074)
  • Fix AssertionError when using scan_delta() on AWS with storage_options (#23076)
  • Fix deadlock on collect(background=True) / collect_concurrently() (#23075)
  • Incorrect null count in rolling_min/max (#23073)
  • Preserve file:// in LazyFrame node traverser (#23072)
  • Respect column order in register_io_source schema (#23057)
  • Don't call unnest for objects implementing __arrow_c_array__ (#23069)
  • Incorrect output when using sort with group_by and cum_sum (#23001)
  • Implement owned arithmetic for Int128 (#23055)
  • Do not schema-match structs with different field counts (#23018)
  • Fix confusing error message on duplicate row_index (#23043)
  • Add include_nulls to Agg::Count CSE check (#23032)
  • View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
  • Fix incorrect result selecting pl.len() from scan_csv with skip_lines (#22949)
  • Allow for IO plugins with reordered columns in streaming (#22987)
  • Method str.zfill was inconsistent with Python and pandas when string contained leading '+' (#22985)
  • Integer underflow in propagate_nulls (#22986)
  • Setting compat_level=0 for sink_ipc (#22960)
  • Narrow return type for DataType.is_, improve Pyright's type completeness from 69% to 95% (#22962)
  • Support arrow Decimal32 and Decimal64 types (#22954)
  • Guard against dictionaries being passed to projection keywords (#22928)
  • Update arrow format (#22941)
  • Fix filter pushdown to IO plugins (#22910)
  • Improve numeric stability rolling_mean<f32> (#22944)
  • Guard against invalid nested objects in 'map_elements' (#22932)
  • Allow subclasses in type equality checking (#22915)
  • Return early in pl.Expr.__array_ufunc__ when only single input (#22913)
  • Add inline implodes in type coercion (#22885)
  • Add {top, bottom}_k_by to Series (#22902)
  • Correct int_ranges to raise error on invalid inputs (#22894)
  • Don't silently overflow for temporal casts (#22901)
  • Fix error using write_csv with storage_options (#22881)
  • Schema resolution .over(mapping_strategy="join") with non-aggregations (#22875)
  • Ensure rename behaves the same as select (#22852)

📖 Documentation

  • Document aggregations that return identity when there's no non-null values, suggest workaround for those who want SQL-standard behaviour (#23143)
  • Fix reference to non-existent Expr.replace_all in replace_strict docs (#23144)
  • Fix typo on pandas comparison page (#23123)
  • Minor improvement to cum_count docstring example (#23099)
  • Add missing DataFrame.__setitem__ to API reference (#22938)
  • Add missing entry for LazyFrame __getitem__ (#22924)
  • Add missing top_k_by and bottom_k_by to Series reference (#22917)

📦 Build system

  • Update pyo3 and numpy crates to version 0.25 (#22763)
  • Actually disable ir_serde by default (#23046)
  • Add a feature flag for serde_ignored (#22957)
  • Fix warnings, update DSL version and schema hash (#22953)

🛠️ Other improvements

  • Change flake to use venv (#23219)
  • Add default_alloc feature to py-polars (#23202)
  • Added more descriptive error message by replacing FixedSizeList with Array (#23168)
  • Connect Python assert_series_equal() to Rust back-end (#23141)
  • Refactor skip_batches to use AExprBuilder (#23147)
  • Use ir_serde instead of serde for IRFunctionExpr (#23148)
  • Separate FunctionExpr and IRFunctionExpr (#23140)
  • Remove AExpr::Alias (#23070)
  • Add components for Iceberg deletion file support (#23059)
  • Feature gate StructFunction::JsonEncode (#23060)
  • Propagate iceberg position delete information to IR (#23045)
  • Add environment variable to get Parquet decoding metrics (#23052)
  • Turn pl.cumulative_eval into its own AExpr (#22994)
  • Add make test-streaming (#23044)
  • Move scan parameter parsing for parquet to reusable function (#23019)
  • Prepare deltalake 1.0 (#22931)
  • Implement Hash and use SpecialEq for RenameAliasFn (#22989)
  • Turn list.eval into an AExpr (#22911)
  • Fix CI for latest pandas-stubs release (#22971)
  • Add a CI check for DSL schema changes (#22898)
  • Add schema parameters to expr.meta (#22906)
  • Update rust toolchain in nix flake (#22905)
  • Update toolchain (#22859)

Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @mcrumiller, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst and @thomasfrederikhoeck

Python Polars 1.31.0-beta.1

14 Jun 09:17
383f1b3
Compare
Choose a tag to compare
Pre-release

💥 Breaking changes

  • Remove old streaming engine (#23103)

⚠️ Deprecations

  • Deprecate allow_missing_columns in scan_parquet in favor of missing_columns (#22784)

🚀 Performance improvements

  • Improve streaming groupby CSE (#23092)
  • Move row index materialization in post-apply to occur after slicing (#22995)
  • Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
  • Don't go through row encoding for most types on index_of (#22903)
  • Optimise low-level null scans and arg_max for bools (when chunked) (#22897)
  • Optimize multiscan performance (#22886)

✨ Enhancements

  • DataType expressions in Python (#23167)
  • Native implementation for Iceberg positional deletes (#23091)
  • Remove old streaming engine (#23103)
  • Basic implementation of DataTypeExpr in Rust DSL (#23049)
  • Add required: bool to ParquetFieldOverwrites (#23013)
  • Support serializing name.map_fields (#22997)
  • Support serializing Expr::RenameAlias (#22988)
  • Remove duplicate verbose logging from FetchedCredentialsCache (#22973)
  • Add keys column in finish_callback (#22968)
  • Add extra_columns parameter to scan_parquet (#22699)
  • Add CORR function to polars SQL (#22690)
  • Add per partition sort and finish callback to sinks (#22789)
  • Support descendingly-sorted values in search_sorted() (#22825)
  • Derive DSL schema (#22866)

🐞 Bug fixes

  • Fix panic reading empty parquet with multiple boolean columns (#23159)
  • Raise ComputeError instead of panicking in truncate when mixing month/week/day/sub-daily units (#23176)
  • Materialize list.eval with unknown type (#23186)
  • Only set sorting flag for 1st column with PQ SortingColumns (#23184)
  • Typo in AExprBuilder (#23171)
  • Null return from var/std on scalar column (#23158)
  • Support Datetime broadcast in list.concat (#23137)
  • Ensure projection pushdown maintains right table schema (#22603)
  • Add Null dtype support to arg_sort_by (#23107)
  • Raise error by default on invalid CSV quotes (#22876)
  • Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
  • Fix hive partition pruning not filtering out __HIVE_DEFAULT_PARTITION__ (#23074)
  • Fix AssertionError when using scan_delta() on AWS with storage_options (#23076)
  • Fix deadlock on collect(background=True) / collect_concurrently() (#23075)
  • Incorrect null count in rolling_min/max (#23073)
  • Preserve file:// in LazyFrame node traverser (#23072)
  • Respect column order in register_io_source schema (#23057)
  • Don't call unnest for objects implementing __arrow_c_array__ (#23069)
  • Incorrect output when using sort with group_by and cum_sum (#23001)
  • Implement owned arithmetic for Int128 (#23055)
  • Do not schema-match structs with different field counts (#23018)
  • Fix confusing error message on duplicate row_index (#23043)
  • Add include_nulls to Agg::Count CSE check (#23032)
  • View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
  • Fix incorrect result selecting pl.len() from scan_csv with skip_lines (#22949)
  • Allow for IO plugins with reordered columns in streaming (#22987)
  • Method str.zfill was inconsistent with Python and pandas when string contained leading '+' (#22985)
  • Integer underflow in propagate_nulls (#22986)
  • Setting compat_level=0 for sink_ipc (#22960)
  • Narrow return type for DataType.is_, improve Pyright's type completeness from 69% to 95% (#22962)
  • Support arrow Decimal32 and Decimal64 types (#22954)
  • Guard against dictionaries being passed to projection keywords (#22928)
  • Update arrow format (#22941)
  • Fix filter pushdown to IO plugins (#22910)
  • Improve numeric stability rolling_mean<f32> (#22944)
  • Guard against invalid nested objects in 'map_elements' (#22932)
  • Allow subclasses in type equality checking (#22915)
  • Return early in pl.Expr.__array_ufunc__ when only single input (#22913)
  • Add inline implodes in type coercion (#22885)
  • Add {top, bottom}_k_by to Series (#22902)
  • Correct int_ranges to raise error on invalid inputs (#22894)
  • Don't silently overflow for temporal casts (#22901)
  • Fix error using write_csv with storage_options (#22881)
  • Schema resolution .over(mapping_strategy="join") with non-aggregations (#22875)
  • Ensure rename behaves the same as select (#22852)

📖 Documentation

  • Document aggregations that return identity when there's no non-null values, suggest workaround for those who want SQL-standard behaviour (#23143)
  • Fix reference to non-existent Expr.replace_all in replace_strict docs (#23144)
  • Fix typo on pandas comparison page (#23123)
  • Minor improvement to cum_count docstring example (#23099)
  • Add missing DataFrame.__setitem__ to API reference (#22938)
  • Add missing entry for LazyFrame __getitem__ (#22924)
  • Add missing top_k_by and bottom_k_by to Series reference (#22917)

📦 Build system

  • Update pyo3 and numpy crates to version 0.25 (#22763)
  • Actually disable ir_serde by default (#23046)
  • Add a feature flag for serde_ignored (#22957)
  • Fix warnings, update DSL version and schema hash (#22953)

🛠️ Other improvements

  • Added more descriptive error message by replacing FixedSizeList with Array (#23168)
  • Connect Python assert_series_equal() to Rust back-end (#23141)
  • Refactor skip_batches to use AExprBuilder (#23147)
  • Use ir_serde instead of serde for IRFunctionExpr (#23148)
  • Separate FunctionExpr and IRFunctionExpr (#23140)
  • Remove AExpr::Alias (#23070)
  • Add components for Iceberg deletion file support (#23059)
  • Feature gate StructFunction::JsonEncode (#23060)
  • Propagate iceberg position delete information to IR (#23045)
  • Add environment variable to get Parquet decoding metrics (#23052)
  • Turn pl.cumulative_eval into its own AExpr (#22994)
  • Add make test-streaming (#23044)
  • Move scan parameter parsing for parquet to reusable function (#23019)
  • Prepare deltalake 1.0 (#22931)
  • Implement Hash and use SpecialEq for RenameAliasFn (#22989)
  • Turn list.eval into an AExpr (#22911)
  • Fix CI for latest pandas-stubs release (#22971)
  • Add a CI check for DSL schema changes (#22898)
  • Add schema parameters to expr.meta (#22906)
  • Update rust toolchain in nix flake (#22905)
  • Update toolchain (#22859)

Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst and @thomasfrederikhoeck