Skip to content

Conversation

@phambinhfin
Copy link

@phambinhfin phambinhfin commented Aug 1, 2025

Addresses handle pool singleton sharing issue between different GPU
operation types in ROCm/HIP backend.

Changes made:

  • Add opaque pointer typedefs for HIP backend:
    • typedef struct hipblasHandle_* gpublasHandle_t
    • typedef struct hipsolverHandle_* gpusolverDnHandle_t
  • Implement inline wrapper functions for type-safe handle operations:
    • gpublasCreate() and gpublasSetStream() for BLAS handles
    • gpusolverDnCreate() and gpusolverDnSetStream() for SOLVER handles

@phambinhfin phambinhfin requested a review from a team as a code owner August 1, 2025 11:33
@Arech8
Copy link

Arech8 commented Aug 1, 2025

Are you sure the PR does what you describe in the title and description?

@phambinhfin phambinhfin force-pushed the phambinh/handle-pool-singleton branch from edd7ffb to cf8b52f Compare August 1, 2025 11:50
@phambinhfin phambinhfin changed the title Fix GPU handle pool singleton aliasing with tag-based template separa… Fix GPU handle pool singleton aliasing Aug 1, 2025
@phambinhfin phambinhfin force-pushed the phambinh/handle-pool-singleton branch from cf8b52f to 4d49336 Compare August 1, 2025 11:59
@phambinhfin
Copy link
Author

@Arech8 I modified the content of commit and title, thanks

Copy link

@Arech8 Arech8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine now.

return hipblasCreate(reinterpret_cast<hipblasHandle_t*>(handle));
}

inline hipblasStatus_t gpublasSetStream(gpublasHandle_t handle, hipStream_t stream) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this one? Does it work if you leave it as define?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, thanks, testing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being a nitpick here. Does #define gpublasSetStream hipblasSetStream work? I was expecting that to work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, i am set like

#define gpublasCreate(handle) hipblasCreate(reinterpret_cast<hipblasHandle_t*>(handle))
#define gpublasSetStream(handle, stream) hipblasSetStream(reinterpret_cast<hipblasHandle_t>(handle), stream)

It works,

@phambinhfin phambinhfin force-pushed the phambinh/handle-pool-singleton branch from 4d49336 to 2bc0153 Compare August 1, 2025 16:22
…tion

Problem:
- BLAS and SOLVER handle pools were sharing the same static singleton instance
- C++ template instantiation created only one pool when HandleType/StreamType were identical
- This caused resource contamination between hipBLAS and hipSOLVER operations
- Led to potential memory corruption and unexpected GPU library behavior
@phambinhfin phambinhfin force-pushed the phambinh/handle-pool-singleton branch from 2bc0153 to f9ef066 Compare August 1, 2025 16:59
@phambinhfin phambinhfin merged commit 9890a51 into rocm-jaxlib-v0.6.0 Aug 1, 2025
zahiqbal pushed a commit that referenced this pull request Oct 8, 2025
Addresses handle pool singleton sharing issue between different GPU
operation types in ROCm/HIP backend.
zahiqbal pushed a commit that referenced this pull request Oct 8, 2025
Addresses handle pool singleton sharing issue between different GPU
operation types in ROCm/HIP backend.

(cherry picked from commit 4b5b368)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants