Skip to content

Conversation

2010YOUY01
Copy link
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

The following join query now it's using NLJ. Given the equal join operators like Hash Join, SortMergeJoin supports a special null comparison behavior: NULL=NULL -> true instead of the default NULL, this query can be safely re-written to equal join with equi-join operator's null_equals_null option enabled.
Given HJ is faster than NLJ in general, this optimization can make similar queries more efficient.

SELECT *
FROM t1
JOIN t2 ON t1.val IS NOT DISTINCT FROM t2.val;

+---------------+------------------------------------------------------------+
| plan_type     | plan                                                       |
+---------------+------------------------------------------------------------+
| physical_plan | ┌───────────────────────────┐                              |
|               | │       ProjectionExec      │                              |
|               | │    --------------------   │                              |
|               | │         t1_id: id         │                              |
|               | │         t2_id: id         │                              |
|               | │          val: val         │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │     NestedLoopJoinExec    ├──────────────┐               |
|               | └─────────────┬─────────────┘              │               |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      ││       DataSourceExec      │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │         bytes: 288        ││         bytes: 288        │ |
|               | │       format: memory      ││       format: memory      │ |
|               | │          rows: 1          ││          rows: 1          │ |
|               | └───────────────────────────┘└───────────────────────────┘ |
|               |                                                            |
+---------------+------------------------------------------------------------+

I've verified that DuckDB is already doing so

D explain SELECT *
  FROM t1
  JOIN t2 ON t1.val IS NOT DISTINCT FROM t2.val;

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             id            │
│            val            │
│             id            │
│            val            │
│                           │
│          ~18 Rows         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         HASH_JOIN         │
│    ────────────────────   │
│      Join Type: INNER     │
│                           │
│        Conditions:        ├──────────────┐
│  val IS NOT DISTINCT FROM │              │
│             val           │              │
│                           │              │
│          ~18 Rows         │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         SEQ_SCAN          │
│    ────────────────────   ││    ────────────────────   │
│         Table: t1         ││         Table: t2         │
│   Type: Sequential Scan   ││   Type: Sequential Scan   │
│                           ││                           │
│        Projections:       ││        Projections:       │
│            val            ││            val            │
│             id            ││             id            │
│                           ││                           │
│          ~6 Rows          ││          ~6 Rows          │
└───────────────────────────┘└───────────────────────────┘

What changes are included in this PR?

Modified logical optimizer pass ExtractEquijoinPredicate to also handle IS NOT DISTINCT FROM case.

INDF expressions will only be extracted to equal join condition, if the join predicate has 0 equality expr, and several INDF comparisons. See the implementation for more details.

Are these changes tested?

sqllogictests (mostly check the plans include HashJoin if the conversion is possible)

Are there any user-facing changes?

No

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Aug 25, 2025
@2010YOUY01 2010YOUY01 marked this pull request as draft August 25, 2025 15:37
@2010YOUY01 2010YOUY01 marked this pull request as ready for review August 25, 2025 15:39
Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks @2010YOUY01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants