Skip to content

Conversation

pow2clk
Copy link
Contributor

@pow2clk pow2clk commented May 6, 2024

When flattening the global for a groupshared matrix, the alignment information was getting lost. As a result, the alignments of the loads and stores were calculating their own alignment based on preferred alignment and trailing zeros of the index. The preferred alignment switched to 16 when the type size was over 128 bits due to a heuristic whose rationale is lost to time. When the global has its own alignment, that gets used, so by retaining it through lowering, the alignments are consistent and reliable.

Includes testing for a few matrix variants

fixes #6416

When flattening the global for a groupshared matrix, the alignment
information was getting lost. As a result, the alignments of the loads
and stores were calculating their own alignment based on preferred
alignment and trailing zeros of the index. The preferred alignment
switched to 16 when the type size was over 128 bits due to a heuristic
whose rationale is lost to time. When the global has its own alignment,
that gets used, so by retaining it through lowering, the alignments are
consistent and reliable.

Includes testing for a few matrix variants

fixes microsoft#6416
@pow2clk pow2clk requested a review from dmpots May 6, 2024 17:19
@pow2clk pow2clk requested a review from a team as a code owner May 6, 2024 17:19
Copy link
Contributor

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a possibility of a perf regression with this change as it may prevent generating larger loads in some backends. Should we set the alignment to what it was assumed to be previously?

Added verification that dimensions and variables are consistent
@pow2clk
Copy link
Contributor Author

pow2clk commented May 6, 2024

There is a possibility of a perf regression with this change as it may prevent generating larger loads in some backends. Should we set the alignment to what it was assumed to be previously?

It's certainly an option. What I have here preserves the alignment that is present from the start. I'm a bit more nervous about interjecting the alignment that the zero index happened to get when loads and stores were generated from the beginning. I fear that might have more unforeseen consequences than this does, but I can experiment with it.

Greg Roth added 3 commits May 6, 2024 13:42
To keep consistent with previous behavior, use the preferred alignment
calculation when assigning alignment to the matrix groupshared instead
of preserving the original value.
Add half and double variants for test cases

Add pass test for hlmatrixlower
@dmpots
Copy link
Contributor

dmpots commented May 21, 2024

There is a possibility of a perf regression with this change as it may prevent generating larger loads in some backends. Should we set the alignment to what it was assumed to be previously?

It's certainly an option. What I have here preserves the alignment that is present from the start. I'm a bit more nervous about interjecting the alignment that the zero index happened to get when loads and stores were generated from the beginning. I fear that might have more unforeseen consequences than this does, but I can experiment with it.

In the updated PR we are using the datalayout to compute the preferred alignment. This looks like the correct way to do it to me.

@pow2clk pow2clk merged commit a6f4025 into microsoft:main May 22, 2024
@pow2clk pow2clk deleted the gs_mat_ldst branch May 22, 2024 20:38
pow2clk pushed a commit to pow2clk/DirectXShaderCompiler that referenced this pull request May 22, 2024
…rosoft#6589)

When flattening the global for a groupshared matrix, the alignment
information was getting lost. As a result, the alignments of the loads
and stores were calculating their own alignment based on preferred
alignment and trailing zeros of the index. The preferred alignment
switched to 16 when the type size was over 128 bits due to a heuristic
whose rationale is lost to time. When the global has its own alignment,
that gets used, so by calculating it at lowering, the alignments are
consistent and reliable.

Includes testing for a few matrix variants and a pass test.

fixes microsoft#6416

(cherry picked from commit a6f4025)
// RUN: %dxc -DTYPE=float1x4 /Tcs_6_0 %s | FileCheck %s -check-prefixes=CHECK,CHECK4
// RUN: %dxc -DTYPE=float2x2 /Tcs_6_0 %s | FileCheck %s -check-prefixes=CHECK,CHECK4
// RUN: %dxc -DTYPE=double4x4 /Tcs_6_0 %s | FileCheck %s -check-prefixes=CHECK,LCHECK,CHECK8,LCHECKF
// RUN: %dxc -DTYPE=float16_t4x4 /Tcs_6_2 %s -enable-16bit-types | FileCheck %s -check-prefixes=CHECK,LCHECK,CHECK2,LCHECKF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be important to also test with a matrix array, non-zero index, and odd matrix sizes, like (3x3, 1x3, 3x1). Since the alignment of the matrix is considered to be the alignment of the element, different array elements for a matrix whose size isn't an even multiple of 16 bytes won't be aligned by that amount.

SjMxr233 pushed a commit to ShaderHelper/DirectXShaderCompiler that referenced this pull request Jul 24, 2025
…rosoft#6589)

When flattening the global for a groupshared matrix, the alignment
information was getting lost. As a result, the alignments of the loads
and stores were calculating their own alignment based on preferred
alignment and trailing zeros of the index. The preferred alignment
switched to 16 when the type size was over 128 bits due to a heuristic
whose rationale is lost to time. When the global has its own alignment,
that gets used, so by calculating it at lowering, the alignments are
consistent and reliable.

Includes testing for a few matrix variants and a pass test.

fixes microsoft#6416
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

dxc generates invalid alignment on groupshared matrix load/store instructions
4 participants