Skip to content

Conversation

@goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Oct 23, 2025

EliminateCrossJoin rule will replace the cross-join plan with the filter to the inner-join plan if possible. However, the generated plan can't be handled by the unparser correctly.

Consider the following SQL:

select customer_id, o.order_id 
from datafusion.public.orders o, datafusion.public.customers as c, datafusion.public.order_items oi
where o.customer_id = c.id and o.order_id = oi.order_id

If the number of the joined tables is greater than 2, they will be converted to an inner join chain

Projection: o.customer_id, o.order_id
  Inner Join: o.order_id = oi.order_id
    Projection: o.order_id, o.customer_id
      Inner Join: o.customer_id = c.id
        SubqueryAlias: o
          TableScan: datafusion.public.orders projection=[order_id, customer_id]
        SubqueryAlias: c
          TableScan: datafusion.public.customers projection=[id]
    SubqueryAlias: oi
      TableScan: datafusion.public.order_items projection=[order_id]

The unparsed result will be

SELECT
    o.customer_id,
    o.order_id
FROM
    (
        SELECT
            o.order_id,
            o.customer_id
        FROM
            datafusion."public".orders AS o
            INNER JOIN datafusion."public".customers AS c ON o.customer_id = c.id
    )
    INNER JOIN datafusion."public".order_items AS oi ON o.order_id = oi.order_id

It's not a valid SQL for most databases because the base table of the select item is placed in an unnamed subquery. We need to rebase the expression manually.

Summary by CodeRabbit

  • Bug Fixes

    • Preserve explicit CROSS JOINs during SQL generation to avoid producing invalid SQL and ensure stable query output.
  • Tests

    • Added a test validating that CROSS JOIN structures and projected columns are retained as expected during transformation.

@coderabbitai
Copy link

coderabbitai bot commented Oct 23, 2025

Walkthrough

EliminateCrossJoin was removed from the unparsing optimization rule set (commented/disabled) to avoid generating invalid SQL; a unit test was added to assert CROSS JOINs remain explicit and projected columns appear from both tables.

Changes

Cohort / File(s) Change Summary
Optimizer rule modification
wren-core/core/src/mdl/context.rs
Disabled removal of cross-joins by commenting out the Arc::new(EliminateCrossJoin::new()) entry in the unparsing optimizer rules and left an explanatory comment.
Unit test added
wren-core/core/src/mdl/mod.rs
Added test_disable_eliminate_cross_join to build a manifest, run analysis on a query with an explicit CROSS JOIN, and assert the transformed SQL retains the CROSS JOIN and projected columns from both tables.

Sequence Diagram(s)

(Skipped — change is a rule toggle and an added test; no control-flow or new runtime interactions to visualize.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • douenergy

Poem

🐰
I hop where joins once hid from sight,
A CROSS stays true in morning light.
No silent merge, just columns shown,
The rabbit cheers — the fix is known! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix(core): disable EliminateCrossJoin rule" directly and fully aligns with the main changeset. The raw summary confirms that the core change is disabling the EliminateCrossJoin optimizer rule in the context.rs file, with the PR objectives clarifying that this addresses the issue of invalid SQL generation when cross-joins are eliminated. The title is concise, specific, and clearly communicates the primary change without unnecessary details, making it easy for teammates to understand the purpose when scanning pull request history.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a156886 and c3caa4c.

📒 Files selected for processing (1)
  • wren-core/core/src/mdl/context.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-core/core/src/mdl/context.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: cargo check
  • GitHub Check: test
  • GitHub Check: ci

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added core rust Pull requests that update Rust code labels Oct 23, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
wren-core/core/src/mdl/context.rs (1)

250-251: Appropriate fix to prevent invalid SQL generation.

Disabling the EliminateCrossJoin rule correctly addresses the issue described in the PR where the optimizer was generating invalid SQL with unnamed subqueries containing references to base-table aliases that were out of scope.

The comment clearly documents the rationale. However, since the comment mentions "expression should be rebased manually," consider adding a TODO comment or issue reference to track future work on implementing proper expression rebasing to re-enable this optimization safely.

Consider adding a reference to track future work:

-        // Disable EliminateCrossJoin to avoid generate invalid sql (expression should be rebased manually)
+        // Disable EliminateCrossJoin to avoid generate invalid sql (expression should be rebased manually)
+        // TODO: Implement expression rebasing to safely re-enable this optimization
         // Arc::new(EliminateCrossJoin::new()),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c14e99 and a156886.

📒 Files selected for processing (2)
  • wren-core/core/src/mdl/context.rs (1 hunks)
  • wren-core/core/src/mdl/mod.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
wren-core/core/src/mdl/mod.rs (1)
wren-core-base/src/mdl/builder.rs (5)
  • new (46-58)
  • new (105-119)
  • new (190-205)
  • new (289-298)
  • new (325-337)
🔇 Additional comments (1)
wren-core/core/src/mdl/mod.rs (1)

3777-3812: LGTM! Test correctly validates cross-join preservation.

The test appropriately verifies that when the EliminateCrossJoin rule is disabled, cross joins are preserved in the transformed SQL output. The expected snapshot confirms that:

  • The comma-separated table syntax is converted to an explicit CROSS JOIN
  • The join condition remains in the WHERE clause rather than being converted to an INNER JOIN ON clause
  • The resulting SQL structure is valid

@goldmedal goldmedal requested a review from douenergy October 23, 2025 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant