Replies: 51 comments
-
I ended up modifying // buffer resource
#ifndef __HIP_DEVICE_COMPILE__ // for host code
#define CK_BUFFER_RESOURCE_3RD_DWORD -1
#elif defined(__gfx803__) || defined(__gfx900__) || defined(__gfx906__) || defined(__gfx908__) || \
defined(__gfx90a__) || defined(__gfx940__) || defined(__gfx941__) || \
defined(__gfx942__) // for GPU code
#define CK_BUFFER_RESOURCE_3RD_DWORD 0x00020000
#elif defined(__gfx1030__) || defined(__gfx1010__) // for GPU code
#define CK_BUFFER_RESOURCE_3RD_DWORD 0x31014000
#elif defined(__gfx1100__) || defined(__gfx1101__) || defined(__gfx1102__) // for GPU code
#define CK_BUFFER_RESOURCE_3RD_DWORD 0x31004000
#endif Then, the build was successful. |
Beta Was this translation helpful? Give feedback.
-
@TyraVex Thanks for reporting this issue. Please let us know if it works or not. We do not have Navi10 GPUs to test it. |
Beta Was this translation helpful? Give feedback.
-
Sorry for the late response. Here are the test results
|
Beta Was this translation helpful? Give feedback.
-
In the There is missing support for gfx941, gfx942, gfx1012 and gfx1030 Folks will be unhappily surprised when MIOpen and similar users of CK fail. |
Beta Was this translation helpful? Give feedback.
-
Our experiments at Solus have found the following:
|
Beta Was this translation helpful? Give feedback.
-
@GZGavinZhao Its interesting that the So if I could just get to understand that we are hitting a hardware limit with gfx1010 (RX 5700 and the likes, Navi10 processor) that it can not do what the gfx1030 (RX6900 and the likes Navi21,23) is doing or its just someones idea for business strategy that gfx1010 is too old to support or whats the case? if its a hardware limitation where did we hit a dead end? why the software is not shipped as well as for gfx1010, im just trying to understand what is going on, whos deciding the stuff? I know there have been a tremendous amount of efforts all over the place which i was not part of, and I appreciate everyone contribution to where things have gone so far and very thankful to everyone's effort but we need to fix this issue, old GPUS are not going to the trash man (if the plan is to send them to the trash, then too bad of a waste)! I just read some main instruction set parts for memory shader and some other registers https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna-shader-instruction-set-architecture.pdf both AMD-GCN-GFX10-RDNA2 and AMD-GCN-GFX10-RDNA1 seems to have an identical instructions sets, I understand there are some details i missed, but from a hardware point of view, everything seems to be existing already in gfx1010 so our problem is software, so probably just copying the gfx1030 stuff and use it for gfx1010 and maybe without modifying anything (or with modifiying few things which is a nightmare:)) we would get a better support for that Navi10 (that gddr6 monster can do crazy stuff, so people are using older software by adopting to the gfx900 arch, well its still a toy in comparison to the MI300X) the problem is that this is not an easy job to do, it should have been considered earlier but we can maybe still do something all together, as the projects involved maybe beyond my capabilities, like: 1-LLVM |
Beta Was this translation helpful? Give feedback.
-
I just hit this error while trying to add the gfx1030 features to gfx1010,
which is expected, as I modified the AMDGPU Target for that arch, so now my question, which one is the cached asm caps and which one derived from where? I passed through the error by forking the Tensile repo and IgnoringAsmCap mismatch |
Beta Was this translation helpful? Give feedback.
-
Could you raise an issue in Tensile for this ? |
Beta Was this translation helpful? Give feedback.
-
Thanks @trixirt , yes, I will also add this issue to it: below is actually the compileArgs command, and the first element is None, which i guess is the cxx_compiler(it should not be None):
|
Beta Was this translation helpful? Give feedback.
-
I agree this issue/discussion should continue in Tensile, but just to add my 2 cents:
RDNA1 and RDNA2 are not the same instruction set. For example, consult section 6.3 of the RDNA1 and RDNA2 ISA architecture. You will see that RDNA2 has
As per my reasoning above, in my understanding it is a hardware limit that Therefore, any attempt at trying to compile and run code intended for RDNA2 on RDNA1 will eventually fail. The best we can do (and what it's already been done at Solus) is to patch libraries like rocBLAS, Tensile, and CK so that they can compile and run RDNA1 code on a best-effort basis. I believe Solus's support for RDNA1 hardware is complete. I assume you own a RDNA1 hardware, so feel free to grab a Solus ISO, install the |
Beta Was this translation helpful? Give feedback.
-
Thanks for having a look @GZGavinZhao appreciate all your efforts.
Actually regarding the above I have a feeling that we can build these instructions for the gfx1010, you referred to the VOP3P instruction which can also be built for that target, it could be all what im doing is none sense, but I like trying stuff and let me see where that takes me :) I will keep you posted either way.
i think Solus is definitely helpful but currently running the gfx1010 on gfx900 is a downgrade. It would be my last resort to do that nevertheless I really like what you guys are doing at Solus looks a good approach for a better GPU support, I have plenty of GPUs as I work as a repairman for myself. And thus i ended up with plenty of Graphic cards from all generations. So i need to be fair to myself :) |
Beta Was this translation helpful? Give feedback.
-
Note that only for CK, I do agree that for CK specifically, we may be able to treat |
Beta Was this translation helpful? Give feedback.
-
Yes I noticed that, and that what brought me here :) I think there is a room of improvement in general :) the repos are huge and there are a lot of stuff like 20 repos :D |
Beta Was this translation helpful? Give feedback.
-
Right now I'm just reading some weird stuff like this: |
Beta Was this translation helpful? Give feedback.
-
That file is LLVM TableGen. The problem is again that By saying "then workaround specific cases where VDOT instructions are used", I meant instead of relying on hardware-accelerated instructions, write portable HIP code that are the equivalent of the VDOT operation. |
Beta Was this translation helpful? Give feedback.
-
@SzczurekYT Since RDNA1 GPUs are not officially supported, I have no plans to create an upstream pull request. However, if you wish, I can provide a diff file you can patch automatically with git. This will patch both composable kernels and MIOpen to introduce basic hardware compatibility for this family of cards. Compile for the architecture of your GPU. In your case, a 5600 XT is a gfx1011. Be prepared to wait several hours for it to complete. |
Beta Was this translation helpful? Give feedback.
-
I don't think the architecture not being officially supoorted is a problem. The way I understand it is that the officially supported gpus are guaranteed to work, but that doesn't ban the community from submitting patches for other architectures.
That would be appreciated.
Interesting, as rocminfo says gfx1010
I can wait. |
Beta Was this translation helpful? Give feedback.
-
I get that. The beauty of open source is that the community can provide patches to enable ROCm on unsupported hardware like RDNA1, as you see here. Also, I have heard of the Tensile fix to restore upstream rocBLAS functionality on it.
I was wrong; thanks for catching that. Both the 5600 and 5700 XT indeed use |
Beta Was this translation helpful? Give feedback.
-
Hi all, sorry for the lack of official response here. The community is doing great work to preserve compatibility with GPUs that have lost official support. As noted in this discussion, ROCm does not support Navi 10 GPUs. However, we do have another macro in Regarding submitting PRs to address issues on unsupported architectures, I will also discuss this with the internal team. In general you are more than welcome to submit PRs for any improvements to the ROCm stack, but since we won't be testing for unsupported architectures I can't say how likely it is that fixes for these architectures get merged. I encourage you to document your fixes somewhere (whether here, in a PR, fork, etc.) so future users with similar issues can refer to them. |
Beta Was this translation helpful? Give feedback.
-
ROCm 6.2.x: https://drive.google.com/file/d/1RFbfYtG0B0JbtTai9iWdVomammAFL8Db/view The attached diff file synthesizes this issue and the community patches from rocm_sdk_bulider. The latter is a repository that has many fixes and features to introduce ROCm functionality on several architectures, including APUs. However, the data-center ones (CDNA) have limited testing and are based on older versions of ROCm. This file is designed with the latest point release in mind, which is 6.2 as of writing. Note that this only restores compilation success for RDNA1 discrete GPUs (RX 5500/5600/5700), so having |
Beta Was this translation helpful? Give feedback.
-
I've spoken with the internal team, and our stance is that PRs are encouraged, and can be accepted even for fixes on unsupported architectures as long as they don't break existing test cases. However, we might not have the resources to review these PRs in a timely manner, so I understand if you would rather provide code fixes via other means, and at the end of the day the fixes/guidance are available to future users either way. |
Beta Was this translation helpful? Give feedback.
-
Since the official ROCm team accepts pull requests for unsupported architectures from the community, and merging may be delayed due to time constraints, this is very good news for improving the user experience. I have tested my patch and can assure you both composable kernels and MIOpen will compile if you use ck's |
Beta Was this translation helpful? Give feedback.
-
Currently working on a PR to fix this. Code has been written, just tests pending. |
Beta Was this translation helpful? Give feedback.
-
CTest results for ROCm 6.3 on gfx1012. It seemed to regress significantly. I saw lots of NaNs and zeros after rerunning the failed tests.
|
Beta Was this translation helpful? Give feedback.
-
Good grief AWS finally approved my quota limit increase so I now have access to a RDNA1 GPU in the cloud. I'm currently debugging these failures. |
Beta Was this translation helpful? Give feedback.
-
Everything fixed instead of one single test failure:
0.003% of wrong values... 😅 |
Beta Was this translation helpful? Give feedback.
-
@GZGavinZhao Do you have your patches available to download? I have a gfx1012 and would like to get this working on my system for use in the interim. |
Beta Was this translation helpful? Give feedback.
-
@mattmcadoo The following three (apply in the given order) should suffice: |
Beta Was this translation helpful? Give feedback.
-
@GZGavinZhao Thank you! Those patches allowed me to finally compile composable_kernel 6.3.0 on my Gentoo system. |
Beta Was this translation helpful? Give feedback.
-
Converted to a discussion, will be open for anyone who needs further assistance from the community regarding this. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, I have some trouble to compile composable_kernel for my AMD GPU architecture (gfx1010)
Any ideas about a solution?
Beta Was this translation helpful? Give feedback.
All reactions