Skip to content

Conversation

@Forsworns
Copy link

Description

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Tested via the provided demo.

@Forsworns
Copy link
Author

I just found there were some typos in the README and the binary was included.

@yunwei37 yunwei37 requested a review from Copilot October 27, 2025 12:25
@yunwei37
Copy link
Member

Thanks a lot!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds early exit functionality for CUDA kernels through eBPF helpers, enabling kernel atomization capabilities similar to network packet filtering. It introduces two new BPF helper functions (bpf_cuda_exit and bpf_get_grid_dim) and provides a complete demonstration through a vector addition example with partition-based execution control.

Key changes:

  • Two new BPF helper functions (507: exit, 508: get_grid_dim) for CUDA kernel control
  • Complete atomizer example with partition-based block filtering
  • PTX-level early exit implementation via inline assembly

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
attach/nv_attach_impl/trampoline/default_trampoline.cu Implements the two new BPF helper functions in CUDA
attach/nv_attach_impl/trampoline_ptx.h Adds PTX assembly for the new helper functions
attach/nv_attach_impl/nv_attach_impl_patcher.cpp Registers the new helper functions (507 and 508)
example/gpu/atomizer/atomizer.bpf.c eBPF program implementing partition-based kernel atomization
example/gpu/atomizer/atomizer.c Userspace loader for the eBPF atomizer program
example/gpu/atomizer/vec_add.cu CUDA vector addition demo application
example/gpu/atomizer/main.ptx Generated PTX assembly from the demo
example/gpu/atomizer/filter_hashtag.py Utility script to filter preprocessor directives
example/gpu/atomizer/README.md Documentation for the atomizer example
example/gpu/atomizer/Makefile Build configuration for the atomizer example
example/gpu/atomizer/.gitignore Git ignore rules for build artifacts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Forsworns Forsworns force-pushed the atomizer branch 2 times, most recently from 0aa403c to e9d2be3 Compare October 27, 2025 13:45
@Forsworns
Copy link
Author

I have addressed typos. But I found another problem:

when I set the launching configuration in vector_add.cu to vectorAdd<<<10, 1>>>(d_A, d_B, d_C);,
and try to read the pre-configured partition number/index from the BPF maps in the atomizer.bpf.c, it sometimes returns null pointer for the given key. I'm not sure where the problem is.

Currently, I only launch a single block in the vector_add.c and it works well. But then only part of threads are exited, instead of the whole thread blocks. Thus, the semantic is different from the LithOS. :(

@yunwei37
Copy link
Member

I think we need to provide new attach types instead of function probes to support that semantic. That's not hard as we have seperated the attach types into passes?

@yunwei37
Copy link
Member

Maybe we can merge that and continue future work on next PR? @Officeyutong @Sy0307 @Forsworns

@Forsworns
Copy link
Author

Forsworns commented Oct 31, 2025

Maybe we can merge that and continue future work on next PR? @Officeyutong @Sy0307 @Forsworns

I'm fine about the attach type. But I'm still bothered by the above BPF map issue, do you have any ideas?

- Add two bpf helper functions for CUDA.
- Add an early-exit demo in CUDA examples.

Close eunomia-bpf#459

Signed-off-by: Forsworns <[email protected]>
@Sy0307
Copy link
Contributor

Sy0307 commented Oct 31, 2025

Maybe I can review it later for BPF map issue? Can you give a more detailed description?

@Forsworns
Copy link
Author

Maybe I can review it later for BPF map issue? Can you give a more detailed description?

@Sy0307 I just opened a new issue in #486 and provide two examples. I guess it is related to the synchronize between host and device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] CUDA kernel early exit demo

4 participants