Skip to content

Conversation

SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Aug 29, 2025

Stack from ghstack (oldest at bottom):

Motivation

Provide an easy way to test and benchmark custom operators when developing them.

Changes

Introduces a custom op test suite under backends/vulkan/test/custom_ops. Each operator will have its own test file, as seen in the next diff. utils.[h|cpp] define common utilities that can be used across test files.

To facilitate prototyping, prototype shaders and C++ host code can be placed under the impl/ and glsl folders.

Output of the test binary looks like:

=== Compute Shader Performance Benchmark ===
Add Operation Prototyping Framework
----------------------------------------------------------------------
Executing 32 test cases for Add
----------------------------------------------------------------------
Add_1x64x64_Texture3D_Float                                                                 [1x64x64]               3.094 μs                1.324 GFLOP/s     PASSED
Add_1x64x64_Texture3D_Half                                                                  [1x64x64]               2.574 μs                1.591 GFLOP/s    SKIPPED
Add_1x64x64_Buffer_Float                                                                    [1x64x64]               3.084 μs                1.328 GFLOP/s     PASSED
Add_1x64x64_Buffer_Half                                                                     [1x64x64]               2.668 μs                1.535 GFLOP/s    SKIPPED
Add_1x128x128_Texture3D_Float                                                             [1x128x128]               6.001 μs                2.730 GFLOP/s     PASSED
Add_1x128x128_Texture3D_Half                                                              [1x128x128]               4.004 μs                4.092 GFLOP/s    SKIPPED
Add_1x128x128_Buffer_Float                                                                [1x128x128]               6.074 μs                2.698 GFLOP/s     PASSED
Add_1x128x128_Buffer_Half                                                                 [1x128x128]               5.112 μs                3.205 GFLOP/s    SKIPPED
Add_1x256x256_Texture3D_Float                                                             [1x256x256]              17.852 μs                3.671 GFLOP/s     PASSED
Add_1x256x256_Texture3D_Half                                                              [1x256x256]              10.057 μs                6.517 GFLOP/s    SKIPPED
Add_1x256x256_Buffer_Float                                                                [1x256x256]              19.027 μs                3.444 GFLOP/s     PASSED
Add_1x256x256_Buffer_Half                                                                 [1x256x256]              15.330 μs                4.275 GFLOP/s    SKIPPED
Add_1x512x512_Texture3D_Float                                                             [1x512x512]              48.292 μs                5.428 GFLOP/s     PASSED
Add_1x512x512_Texture3D_Half                                                              [1x512x512]              26.832 μs                9.770 GFLOP/s    SKIPPED
Add_1x512x512_Buffer_Float                                                                [1x512x512]              48.828 μs                5.369 GFLOP/s     PASSED
Add_1x512x512_Buffer_Half                                                                 [1x512x512]              48.308 μs                5.427 GFLOP/s    SKIPPED
Add_1x1x1024_Texture3D_Float                                                               [1x1x1024]               2.376 μs                0.431 GFLOP/s     PASSED
Add_1x1x1024_Texture3D_Half                                                                [1x1x1024]               2.215 μs                0.462 GFLOP/s    SKIPPED
Add_1x1x1024_Buffer_Float                                                                  [1x1x1024]               2.402 μs                0.426 GFLOP/s     PASSED
Add_1x1x1024_Buffer_Half                                                                   [1x1x1024]               2.304 μs                0.445 GFLOP/s    SKIPPED
Add_1x1024x1_Texture3D_Float                                                               [1x1024x1]               6.120 μs                0.167 GFLOP/s     PASSED
Add_1x1024x1_Texture3D_Half                                                                [1x1024x1]               6.245 μs                0.164 GFLOP/s    SKIPPED
Add_1x1024x1_Buffer_Float                                                                  [1x1024x1]               2.392 μs                0.428 GFLOP/s     PASSED
Add_1x1024x1_Buffer_Half                                                                   [1x1024x1]               2.304 μs                0.445 GFLOP/s    SKIPPED
Add_32x32x32_Texture3D_Float                                                               [32x32x32]              10.249 μs                3.197 GFLOP/s     PASSED
Add_32x32x32_Texture3D_Half                                                                [32x32x32]               6.583 μs                4.978 GFLOP/s    SKIPPED
Add_32x32x32_Buffer_Float                                                                  [32x32x32]              10.468 μs                3.130 GFLOP/s     PASSED
Add_32x32x32_Buffer_Half                                                                   [32x32x32]               8.481 μs                3.864 GFLOP/s    SKIPPED
Add_16x128x64_Texture3D_Float                                                             [16x128x64]              26.000 μs                5.041 GFLOP/s     PASSED
Add_16x128x64_Texture3D_Half                                                              [16x128x64]              17.841 μs                7.347 GFLOP/s    SKIPPED
Add_16x128x64_Buffer_Float                                                                [16x128x64]              28.917 μs                4.533 GFLOP/s     PASSED
Add_16x128x64_Buffer_Half                                                                 [16x128x64]              28.792 μs                4.552 GFLOP/s    SKIPPED

SKIPPED means that correctness checking is not performed on that test case. This usually happens in one of the following cases:

  • Input/output dtype is fp16. There is no fp16 dtype support in reference calculation functions
  • Input sizes are too big. Since reference calculation functions are implemented in a naive manner, calculating reference data may take too long for large inputs. Larger test cases are usually meant to tests performance, not correctness.

Differential Revision: D81323426

cc @manuelcandales @cbilgin

…ulkan operator testing to CI

## Motivation

Provide an easy way to test and benchmark custom operators when developing them.

## Changes

Introduces a custom op test suite under `backends/vulkan/test/custom_ops`. Each operator will have its own test file, as seen in the next diff. `utils.[h|cpp]` define common utilities that can be used across test files.

To facilitate prototyping, prototype shaders and C++ host code can be placed under the `impl/` and `glsl` folders.

Output of the test binary looks like:

```
=== Compute Shader Performance Benchmark ===
Add Operation Prototyping Framework
----------------------------------------------------------------------
Executing 32 test cases for Add
----------------------------------------------------------------------
Add_1x64x64_Texture3D_Float                                                                 [1x64x64]               3.094 μs                1.324 GFLOP/s     PASSED
Add_1x64x64_Texture3D_Half                                                                  [1x64x64]               2.574 μs                1.591 GFLOP/s    SKIPPED
Add_1x64x64_Buffer_Float                                                                    [1x64x64]               3.084 μs                1.328 GFLOP/s     PASSED
Add_1x64x64_Buffer_Half                                                                     [1x64x64]               2.668 μs                1.535 GFLOP/s    SKIPPED
Add_1x128x128_Texture3D_Float                                                             [1x128x128]               6.001 μs                2.730 GFLOP/s     PASSED
Add_1x128x128_Texture3D_Half                                                              [1x128x128]               4.004 μs                4.092 GFLOP/s    SKIPPED
Add_1x128x128_Buffer_Float                                                                [1x128x128]               6.074 μs                2.698 GFLOP/s     PASSED
Add_1x128x128_Buffer_Half                                                                 [1x128x128]               5.112 μs                3.205 GFLOP/s    SKIPPED
Add_1x256x256_Texture3D_Float                                                             [1x256x256]              17.852 μs                3.671 GFLOP/s     PASSED
Add_1x256x256_Texture3D_Half                                                              [1x256x256]              10.057 μs                6.517 GFLOP/s    SKIPPED
Add_1x256x256_Buffer_Float                                                                [1x256x256]              19.027 μs                3.444 GFLOP/s     PASSED
Add_1x256x256_Buffer_Half                                                                 [1x256x256]              15.330 μs                4.275 GFLOP/s    SKIPPED
Add_1x512x512_Texture3D_Float                                                             [1x512x512]              48.292 μs                5.428 GFLOP/s     PASSED
Add_1x512x512_Texture3D_Half                                                              [1x512x512]              26.832 μs                9.770 GFLOP/s    SKIPPED
Add_1x512x512_Buffer_Float                                                                [1x512x512]              48.828 μs                5.369 GFLOP/s     PASSED
Add_1x512x512_Buffer_Half                                                                 [1x512x512]              48.308 μs                5.427 GFLOP/s    SKIPPED
Add_1x1x1024_Texture3D_Float                                                               [1x1x1024]               2.376 μs                0.431 GFLOP/s     PASSED
Add_1x1x1024_Texture3D_Half                                                                [1x1x1024]               2.215 μs                0.462 GFLOP/s    SKIPPED
Add_1x1x1024_Buffer_Float                                                                  [1x1x1024]               2.402 μs                0.426 GFLOP/s     PASSED
Add_1x1x1024_Buffer_Half                                                                   [1x1x1024]               2.304 μs                0.445 GFLOP/s    SKIPPED
Add_1x1024x1_Texture3D_Float                                                               [1x1024x1]               6.120 μs                0.167 GFLOP/s     PASSED
Add_1x1024x1_Texture3D_Half                                                                [1x1024x1]               6.245 μs                0.164 GFLOP/s    SKIPPED
Add_1x1024x1_Buffer_Float                                                                  [1x1024x1]               2.392 μs                0.428 GFLOP/s     PASSED
Add_1x1024x1_Buffer_Half                                                                   [1x1024x1]               2.304 μs                0.445 GFLOP/s    SKIPPED
Add_32x32x32_Texture3D_Float                                                               [32x32x32]              10.249 μs                3.197 GFLOP/s     PASSED
Add_32x32x32_Texture3D_Half                                                                [32x32x32]               6.583 μs                4.978 GFLOP/s    SKIPPED
Add_32x32x32_Buffer_Float                                                                  [32x32x32]              10.468 μs                3.130 GFLOP/s     PASSED
Add_32x32x32_Buffer_Half                                                                   [32x32x32]               8.481 μs                3.864 GFLOP/s    SKIPPED
Add_16x128x64_Texture3D_Float                                                             [16x128x64]              26.000 μs                5.041 GFLOP/s     PASSED
Add_16x128x64_Texture3D_Half                                                              [16x128x64]              17.841 μs                7.347 GFLOP/s    SKIPPED
Add_16x128x64_Buffer_Float                                                                [16x128x64]              28.917 μs                4.533 GFLOP/s     PASSED
Add_16x128x64_Buffer_Half                                                                 [16x128x64]              28.792 μs                4.552 GFLOP/s    SKIPPED
```

`SKIPPED` means that correctness checking is not performed on that test case. This usually happens in one of the following cases:

* Input/output dtype is fp16. There is no fp16 dtype support in reference calculation functions
* Input sizes are too big. Since reference calculation functions are implemented in a naive manner, calculating reference data may take too long for large inputs. Larger test cases are usually meant to tests performance, not correctness.

Differential Revision: [D81323426](https://our.internmc.facebook.com/intern/diff/D81323426/)

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ label Aug 29, 2025
Copy link

pytorch-bot bot commented Aug 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13815

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 7 New Failures

As of commit e8ee9ca with merge base e2098f8 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 29, 2025
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81323426

…ite & add vulkan operator testing to CI"

## Motivation

Provide an easy way to test and benchmark custom operators when developing them.

## Changes

Introduces a custom op test suite under `backends/vulkan/test/custom_ops`. Each operator will have its own test file, as seen in the next diff. `utils.[h|cpp]` define common utilities that can be used across test files.

To facilitate prototyping, prototype shaders and C++ host code can be placed under the `impl/` and `glsl` folders.

Output of the test binary looks like:

```
=== Compute Shader Performance Benchmark ===
Add Operation Prototyping Framework
----------------------------------------------------------------------
Executing 32 test cases for Add
----------------------------------------------------------------------
Add_1x64x64_Texture3D_Float                                                                 [1x64x64]               3.094 μs                1.324 GFLOP/s     PASSED
Add_1x64x64_Texture3D_Half                                                                  [1x64x64]               2.574 μs                1.591 GFLOP/s    SKIPPED
Add_1x64x64_Buffer_Float                                                                    [1x64x64]               3.084 μs                1.328 GFLOP/s     PASSED
Add_1x64x64_Buffer_Half                                                                     [1x64x64]               2.668 μs                1.535 GFLOP/s    SKIPPED
Add_1x128x128_Texture3D_Float                                                             [1x128x128]               6.001 μs                2.730 GFLOP/s     PASSED
Add_1x128x128_Texture3D_Half                                                              [1x128x128]               4.004 μs                4.092 GFLOP/s    SKIPPED
Add_1x128x128_Buffer_Float                                                                [1x128x128]               6.074 μs                2.698 GFLOP/s     PASSED
Add_1x128x128_Buffer_Half                                                                 [1x128x128]               5.112 μs                3.205 GFLOP/s    SKIPPED
Add_1x256x256_Texture3D_Float                                                             [1x256x256]              17.852 μs                3.671 GFLOP/s     PASSED
Add_1x256x256_Texture3D_Half                                                              [1x256x256]              10.057 μs                6.517 GFLOP/s    SKIPPED
Add_1x256x256_Buffer_Float                                                                [1x256x256]              19.027 μs                3.444 GFLOP/s     PASSED
Add_1x256x256_Buffer_Half                                                                 [1x256x256]              15.330 μs                4.275 GFLOP/s    SKIPPED
Add_1x512x512_Texture3D_Float                                                             [1x512x512]              48.292 μs                5.428 GFLOP/s     PASSED
Add_1x512x512_Texture3D_Half                                                              [1x512x512]              26.832 μs                9.770 GFLOP/s    SKIPPED
Add_1x512x512_Buffer_Float                                                                [1x512x512]              48.828 μs                5.369 GFLOP/s     PASSED
Add_1x512x512_Buffer_Half                                                                 [1x512x512]              48.308 μs                5.427 GFLOP/s    SKIPPED
Add_1x1x1024_Texture3D_Float                                                               [1x1x1024]               2.376 μs                0.431 GFLOP/s     PASSED
Add_1x1x1024_Texture3D_Half                                                                [1x1x1024]               2.215 μs                0.462 GFLOP/s    SKIPPED
Add_1x1x1024_Buffer_Float                                                                  [1x1x1024]               2.402 μs                0.426 GFLOP/s     PASSED
Add_1x1x1024_Buffer_Half                                                                   [1x1x1024]               2.304 μs                0.445 GFLOP/s    SKIPPED
Add_1x1024x1_Texture3D_Float                                                               [1x1024x1]               6.120 μs                0.167 GFLOP/s     PASSED
Add_1x1024x1_Texture3D_Half                                                                [1x1024x1]               6.245 μs                0.164 GFLOP/s    SKIPPED
Add_1x1024x1_Buffer_Float                                                                  [1x1024x1]               2.392 μs                0.428 GFLOP/s     PASSED
Add_1x1024x1_Buffer_Half                                                                   [1x1024x1]               2.304 μs                0.445 GFLOP/s    SKIPPED
Add_32x32x32_Texture3D_Float                                                               [32x32x32]              10.249 μs                3.197 GFLOP/s     PASSED
Add_32x32x32_Texture3D_Half                                                                [32x32x32]               6.583 μs                4.978 GFLOP/s    SKIPPED
Add_32x32x32_Buffer_Float                                                                  [32x32x32]              10.468 μs                3.130 GFLOP/s     PASSED
Add_32x32x32_Buffer_Half                                                                   [32x32x32]               8.481 μs                3.864 GFLOP/s    SKIPPED
Add_16x128x64_Texture3D_Float                                                             [16x128x64]              26.000 μs                5.041 GFLOP/s     PASSED
Add_16x128x64_Texture3D_Half                                                              [16x128x64]              17.841 μs                7.347 GFLOP/s    SKIPPED
Add_16x128x64_Buffer_Float                                                                [16x128x64]              28.917 μs                4.533 GFLOP/s     PASSED
Add_16x128x64_Buffer_Half                                                                 [16x128x64]              28.792 μs                4.552 GFLOP/s    SKIPPED
```

`SKIPPED` means that correctness checking is not performed on that test case. This usually happens in one of the following cases:

* Input/output dtype is fp16. There is no fp16 dtype support in reference calculation functions
* Input sizes are too big. Since reference calculation functions are implemented in a naive manner, calculating reference data may take too long for large inputs. Larger test cases are usually meant to tests performance, not correctness.

Differential Revision: [D81323426](https://our.internmc.facebook.com/intern/diff/D81323426/)

cc manuelcandales cbilgin

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81323426

…ite & add vulkan operator testing to CI"

## Motivation

Provide an easy way to test and benchmark custom operators when developing them.

## Changes

Introduces a custom op test suite under `backends/vulkan/test/custom_ops`. Each operator will have its own test file, as seen in the next diff. `utils.[h|cpp]` define common utilities that can be used across test files.

To facilitate prototyping, prototype shaders and C++ host code can be placed under the `impl/` and `glsl` folders.

Output of the test binary looks like:

```
=== Compute Shader Performance Benchmark ===
Add Operation Prototyping Framework
----------------------------------------------------------------------
Executing 32 test cases for Add
----------------------------------------------------------------------
Add_1x64x64_Texture3D_Float                                                                 [1x64x64]               3.094 μs                1.324 GFLOP/s     PASSED
Add_1x64x64_Texture3D_Half                                                                  [1x64x64]               2.574 μs                1.591 GFLOP/s    SKIPPED
Add_1x64x64_Buffer_Float                                                                    [1x64x64]               3.084 μs                1.328 GFLOP/s     PASSED
Add_1x64x64_Buffer_Half                                                                     [1x64x64]               2.668 μs                1.535 GFLOP/s    SKIPPED
Add_1x128x128_Texture3D_Float                                                             [1x128x128]               6.001 μs                2.730 GFLOP/s     PASSED
Add_1x128x128_Texture3D_Half                                                              [1x128x128]               4.004 μs                4.092 GFLOP/s    SKIPPED
Add_1x128x128_Buffer_Float                                                                [1x128x128]               6.074 μs                2.698 GFLOP/s     PASSED
Add_1x128x128_Buffer_Half                                                                 [1x128x128]               5.112 μs                3.205 GFLOP/s    SKIPPED
Add_1x256x256_Texture3D_Float                                                             [1x256x256]              17.852 μs                3.671 GFLOP/s     PASSED
Add_1x256x256_Texture3D_Half                                                              [1x256x256]              10.057 μs                6.517 GFLOP/s    SKIPPED
Add_1x256x256_Buffer_Float                                                                [1x256x256]              19.027 μs                3.444 GFLOP/s     PASSED
Add_1x256x256_Buffer_Half                                                                 [1x256x256]              15.330 μs                4.275 GFLOP/s    SKIPPED
Add_1x512x512_Texture3D_Float                                                             [1x512x512]              48.292 μs                5.428 GFLOP/s     PASSED
Add_1x512x512_Texture3D_Half                                                              [1x512x512]              26.832 μs                9.770 GFLOP/s    SKIPPED
Add_1x512x512_Buffer_Float                                                                [1x512x512]              48.828 μs                5.369 GFLOP/s     PASSED
Add_1x512x512_Buffer_Half                                                                 [1x512x512]              48.308 μs                5.427 GFLOP/s    SKIPPED
Add_1x1x1024_Texture3D_Float                                                               [1x1x1024]               2.376 μs                0.431 GFLOP/s     PASSED
Add_1x1x1024_Texture3D_Half                                                                [1x1x1024]               2.215 μs                0.462 GFLOP/s    SKIPPED
Add_1x1x1024_Buffer_Float                                                                  [1x1x1024]               2.402 μs                0.426 GFLOP/s     PASSED
Add_1x1x1024_Buffer_Half                                                                   [1x1x1024]               2.304 μs                0.445 GFLOP/s    SKIPPED
Add_1x1024x1_Texture3D_Float                                                               [1x1024x1]               6.120 μs                0.167 GFLOP/s     PASSED
Add_1x1024x1_Texture3D_Half                                                                [1x1024x1]               6.245 μs                0.164 GFLOP/s    SKIPPED
Add_1x1024x1_Buffer_Float                                                                  [1x1024x1]               2.392 μs                0.428 GFLOP/s     PASSED
Add_1x1024x1_Buffer_Half                                                                   [1x1024x1]               2.304 μs                0.445 GFLOP/s    SKIPPED
Add_32x32x32_Texture3D_Float                                                               [32x32x32]              10.249 μs                3.197 GFLOP/s     PASSED
Add_32x32x32_Texture3D_Half                                                                [32x32x32]               6.583 μs                4.978 GFLOP/s    SKIPPED
Add_32x32x32_Buffer_Float                                                                  [32x32x32]              10.468 μs                3.130 GFLOP/s     PASSED
Add_32x32x32_Buffer_Half                                                                   [32x32x32]               8.481 μs                3.864 GFLOP/s    SKIPPED
Add_16x128x64_Texture3D_Float                                                             [16x128x64]              26.000 μs                5.041 GFLOP/s     PASSED
Add_16x128x64_Texture3D_Half                                                              [16x128x64]              17.841 μs                7.347 GFLOP/s    SKIPPED
Add_16x128x64_Buffer_Float                                                                [16x128x64]              28.917 μs                4.533 GFLOP/s     PASSED
Add_16x128x64_Buffer_Half                                                                 [16x128x64]              28.792 μs                4.552 GFLOP/s    SKIPPED
```

`SKIPPED` means that correctness checking is not performed on that test case. This usually happens in one of the following cases:

* Input/output dtype is fp16. There is no fp16 dtype support in reference calculation functions
* Input sizes are too big. Since reference calculation functions are implemented in a naive manner, calculating reference data may take too long for large inputs. Larger test cases are usually meant to tests performance, not correctness.

Differential Revision: [D81323426](https://our.internmc.facebook.com/intern/diff/D81323426/)

cc manuelcandales cbilgin

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81323426

@facebook-github-bot facebook-github-bot merged commit c81771f into gh/SS-JIA/315/base Aug 30, 2025
105 of 115 checks passed
@facebook-github-bot facebook-github-bot deleted the gh/SS-JIA/315/head branch August 30, 2025 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants