Skip to content

Incorrect coalesced key column column output names for new-streaming right-join #23246

@nameexhaustion

Description

@nameexhaustion

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

lhs = pl.LazyFrame(
    {"a": [1, 2, 3, 4, 5], "b": [1, 2, 3, 4, None], "c": ["a", "b", "c", "d", "e"]}
)
rhs = pl.LazyFrame(
    {"a": [1, 2, 3, 4, 5], "b": [1, 2, 3, None, 5], "c": ["A", "B", "C", "D", "E"]}
)

q = lhs.join(rhs, left_on="a", right_on="b", how="right", maintain_order="left_right")

engine=in-memory:
shape: (5, 5)
┌──────┬──────┬─────┬─────────┬─────────┐
│ bcab_rightc_right │
│ ---------------     │
│ i64stri64i64str     │
╞══════╪══════╪═════╪═════════╪═════════╡
│ 1a11A       │
│ 2b22B       │
│ 3c33C       │
│ nulle55E       │
│ nullnull4nullD       │
└──────┴──────┴─────┴─────────┴─────────┘
engine=streaming:
shape: (5, 5)
┌──────┬──────┬─────────┬──────┬─────────┐
│ bca_rightbc_right │
│ ---------------     │
│ i64stri64i64str     │
╞══════╪══════╪═════════╪══════╪═════════╡
│ 1a11A       │
│ 2b22B       │
│ 3c33C       │
│ nulle55E       │
│ nullnull4nullD       │
└──────┴──────┴─────────┴──────┴─────────┘

Note that with debug_assertions enabled, we get a panic instead:

polars-stream: updating graph state
polars-stream: running equi-join in subgraph
polars-stream: running in-memory-source in subgraph
async thread count: 1
polars-stream: done running graph phase
polars-stream: updating graph state
polars-stream: running in-memory-sink in subgraph
polars-stream: running equi-join in subgraph
polars-stream: running in-memory-source in subgraph

thread 'async-executor-0' panicked at crates/polars-core/src/frame/horizontal.rs:27:21:
called `Result::unwrap()` on an `Err` value: Duplicate(ErrString("column with name 'b' has more than one occurrence"))
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/bc821528634632b4ff8dee5ac1ea4ad90d1b3eb5/library/std/src/panicking.rs:697:5
   1: core::panicking::panic_fmt
             at /rustc/bc821528634632b4ff8dee5ac1ea4ad90d1b3eb5/library/core/src/panicking.rs:75:14
   2: core::result::unwrap_failed
             at /rustc/bc821528634632b4ff8dee5ac1ea4ad90d1b3eb5/library/core/src/result.rs:1732:5
   3: core::result::Result<T,E>::unwrap
             at /Users/nxs/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1137:23
   4: polars_core::frame::horizontal::<impl polars_core::frame::DataFrame>::hstack_mut_unchecked
             at ./crates/polars-core/src/frame/horizontal.rs:27:17
   5: polars_stream::nodes::joins::equi_join::ProbeState::partition_and_probe::{{closure}}::{{closure}}
             at ./crates/polars-stream/src/nodes/joins/equi_join.rs:795:29
   6: polars_stream::nodes::joins::equi_join::ProbeState::partition_and_probe::{{closure}}
             at ./crates/polars-stream/src/nodes/joins/equi_join.rs:871:50
   7: <polars_stream::async_executor::task::Task<F,S,M> as polars_stream::async_executor::task::DynTask<M>>::run::{{closure}}
             at ./crates/polars-stream/src/async_executor/task.rs:173:21
   8: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /Users/nxs/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
   9: std::panicking::try::do_call
             at /Users/nxs/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:589:40
  10: ___rust_try
  11: std::panicking::try
             at /Users/nxs/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:552:19
  12: std::panic::catch_unwind
             at /Users/nxs/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:359:14
  13: <polars_stream::async_executor::task::Task<F,S,M> as polars_stream::async_executor::task::DynTask<M>>::run
             at ./crates/polars-stream/src/async_executor/task.rs:171:17
  14: polars_stream::async_executor::task::Runnable<M>::run
             at ./crates/polars-stream/src/async_executor/task.rs:278:9
  15: polars_stream::async_executor::Executor::runner
             at ./crates/polars-stream/src/async_executor/mod.rs:259:17
  16: polars_stream::async_executor::Executor::global::{{closure}}::{{closure}}::{{closure}}
             at ./crates/polars-stream/src/async_executor/mod.rs:277:40
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Traceback (most recent call last):
  File "/Users/nxs/git/polars/.env/x.py", line 113, in <module>
    print(q.collect())
          ^^^^^^^^^^^
  File "/Users/nxs/git/polars/py-polars/polars/_utils/deprecation.py", line 97, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nxs/git/polars/py-polars/polars/lazyframe/opt_flags.py", line 330, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nxs/git/polars/py-polars/polars/lazyframe/frame.py", line 2332, in collect
    return wrap_df(ldf.collect(engine, callback))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: Duplicate(ErrString("column with name 'b' has more than one occurrence"))

Issue description

It outputs col(a) from the right as a_right instead of a.

Expected behavior

Streaming engine should output the same table as the in-memory engine.

Installed versions

1.31.0

Metadata

Metadata

Labels

P-mediumPriority: mediumacceptedReady for implementationbugSomething isn't workingpythonRelated to Python Polars

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions