when i use onnx and CUDAExecutionProvider , why onnx fp16 slow than fp32 on nvidia3090

### Search before asking

- [x] I have searched the YOLOv5 [issues](https://github.com/ultralytics/yolov5/issues) and [discussions](https://github.com/ultralytics/yolov5/discussions) and found no similar questions.


### Question

python 3.8
cuda 11.3
onnxruntime-gpu 1.14.1

```
python export.py --weights ./best.pt --imgsz 960 --device 0 --half --simplify
```
Another question, will onnx using CUDAExecutionProvider be faster than pt file using gpu?
Thanks for your answer


### Additional

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

when i use onnx and CUDAExecutionProvider , why onnx fp16 slow than fp32 on nvidia3090 #13628

Search before asking

Question

Additional

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

when i use onnx and CUDAExecutionProvider , why onnx fp16 slow than fp32 on nvidia3090 #13628

Description

Search before asking

Question

Additional

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions