-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Labels
P-mediumPriority: mediumPriority: mediumacceptedReady for implementationReady for implementationbugSomething isn't workingSomething isn't workingpythonRelated to Python PolarsRelated to Python Polars
Description
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
lhs = pl.LazyFrame(
{"a": [1, 2, 3, 4, 5], "b": [1, 2, 3, 4, None], "c": ["a", "b", "c", "d", "e"]}
)
rhs = pl.LazyFrame(
{"a": [1, 2, 3, 4, 5], "b": [1, 2, 3, None, 5], "c": ["A", "B", "C", "D", "E"]}
)
q = lhs.join(rhs, left_on="a", right_on="b", how="right", maintain_order="left_right")
engine=in-memory:
shape: (5, 5)
┌──────┬──────┬─────┬─────────┬─────────┐
│ b ┆ c ┆ a ┆ b_right ┆ c_right │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪═════╪═════════╪═════════╡
│ 1 ┆ a ┆ 1 ┆ 1 ┆ A │
│ 2 ┆ b ┆ 2 ┆ 2 ┆ B │
│ 3 ┆ c ┆ 3 ┆ 3 ┆ C │
│ null ┆ e ┆ 5 ┆ 5 ┆ E │
│ null ┆ null ┆ 4 ┆ null ┆ D │
└──────┴──────┴─────┴─────────┴─────────┘
engine=streaming:
shape: (5, 5)
┌──────┬──────┬─────────┬──────┬─────────┐
│ b ┆ c ┆ a_right ┆ b ┆ c_right │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪═════════╪══════╪═════════╡
│ 1 ┆ a ┆ 1 ┆ 1 ┆ A │
│ 2 ┆ b ┆ 2 ┆ 2 ┆ B │
│ 3 ┆ c ┆ 3 ┆ 3 ┆ C │
│ null ┆ e ┆ 5 ┆ 5 ┆ E │
│ null ┆ null ┆ 4 ┆ null ┆ D │
└──────┴──────┴─────────┴──────┴─────────┘
Note that with debug_assertions enabled, we get a panic instead:
polars-stream: updating graph state
polars-stream: running equi-join in subgraph
polars-stream: running in-memory-source in subgraph
async thread count: 1
polars-stream: done running graph phase
polars-stream: updating graph state
polars-stream: running in-memory-sink in subgraph
polars-stream: running equi-join in subgraph
polars-stream: running in-memory-source in subgraph
thread 'async-executor-0' panicked at crates/polars-core/src/frame/horizontal.rs:27:21:
called `Result::unwrap()` on an `Err` value: Duplicate(ErrString("column with name 'b' has more than one occurrence"))
stack backtrace:
0: __rustc::rust_begin_unwind
at /rustc/bc821528634632b4ff8dee5ac1ea4ad90d1b3eb5/library/std/src/panicking.rs:697:5
1: core::panicking::panic_fmt
at /rustc/bc821528634632b4ff8dee5ac1ea4ad90d1b3eb5/library/core/src/panicking.rs:75:14
2: core::result::unwrap_failed
at /rustc/bc821528634632b4ff8dee5ac1ea4ad90d1b3eb5/library/core/src/result.rs:1732:5
3: core::result::Result<T,E>::unwrap
at /Users/nxs/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1137:23
4: polars_core::frame::horizontal::<impl polars_core::frame::DataFrame>::hstack_mut_unchecked
at ./crates/polars-core/src/frame/horizontal.rs:27:17
5: polars_stream::nodes::joins::equi_join::ProbeState::partition_and_probe::{{closure}}::{{closure}}
at ./crates/polars-stream/src/nodes/joins/equi_join.rs:795:29
6: polars_stream::nodes::joins::equi_join::ProbeState::partition_and_probe::{{closure}}
at ./crates/polars-stream/src/nodes/joins/equi_join.rs:871:50
7: <polars_stream::async_executor::task::Task<F,S,M> as polars_stream::async_executor::task::DynTask<M>>::run::{{closure}}
at ./crates/polars-stream/src/async_executor/task.rs:173:21
8: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /Users/nxs/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
9: std::panicking::try::do_call
at /Users/nxs/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:589:40
10: ___rust_try
11: std::panicking::try
at /Users/nxs/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:552:19
12: std::panic::catch_unwind
at /Users/nxs/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:359:14
13: <polars_stream::async_executor::task::Task<F,S,M> as polars_stream::async_executor::task::DynTask<M>>::run
at ./crates/polars-stream/src/async_executor/task.rs:171:17
14: polars_stream::async_executor::task::Runnable<M>::run
at ./crates/polars-stream/src/async_executor/task.rs:278:9
15: polars_stream::async_executor::Executor::runner
at ./crates/polars-stream/src/async_executor/mod.rs:259:17
16: polars_stream::async_executor::Executor::global::{{closure}}::{{closure}}::{{closure}}
at ./crates/polars-stream/src/async_executor/mod.rs:277:40
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Traceback (most recent call last):
File "/Users/nxs/git/polars/.env/x.py", line 113, in <module>
print(q.collect())
^^^^^^^^^^^
File "/Users/nxs/git/polars/py-polars/polars/_utils/deprecation.py", line 97, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nxs/git/polars/py-polars/polars/lazyframe/opt_flags.py", line 330, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nxs/git/polars/py-polars/polars/lazyframe/frame.py", line 2332, in collect
return wrap_df(ldf.collect(engine, callback))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: Duplicate(ErrString("column with name 'b' has more than one occurrence"))
Issue description
It outputs col(a) from the right as a_right
instead of a
.
Expected behavior
Streaming engine should output the same table as the in-memory engine.
Installed versions
1.31.0
lmocsi
Metadata
Metadata
Assignees
Labels
P-mediumPriority: mediumPriority: mediumacceptedReady for implementationReady for implementationbugSomething isn't workingSomething isn't workingpythonRelated to Python PolarsRelated to Python Polars