-
Notifications
You must be signed in to change notification settings - Fork 2k
Use core ID when selecting cores #25340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use core ID when selecting cores #25340
Conversation
If the available cores are not a continuous set, the core selector might
panic when trying to select cores.
For example, consider a scenario where the available cores for the selector are the following:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]
This list contains 46 cores, because cores with IDs 0 and 24 are not
included in the list
Before this patch, if we requested 46 cores, the selector would panic
trying to access the item with index 46 in `cs.topology.Cores`.
This patch changes the selector to use the core ID instead when looking
for a core inside `cs.topology.Cores`. This prevents an out of bounds
access that was causing the panic.
Note: The patch is straightforward with the change. Perhaps a better
long-term solution would be to restructure the `numalib.Topology.Cores`
field to be a `map[ID]Core`, but that is a much larger change that is
more difficult to land. Also, the amount of cores in our case is
small—at most 192—so a search won't have any noticeable impact.
|
Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement Learn more about why HashiCorp requires a CLA and what the CLA includes Have you signed the CLA already but the status is still pending? Recheck it. |
| sortedTopologyCores := make([]numalib.Core, len(cs.topology.Cores)) | ||
| copy(sortedTopologyCores, cs.topology.Cores) | ||
| slices.SortFunc(sortedTopologyCores, func(a, b numalib.Core) int { return cmp.Compare(a.ID, b.ID) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think about leaving it like this, or if it's better to rework how the cores are stored in the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine; we want to keep that data structure the same due to requirements in the enterprise version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @carlosgaldino! This is looking really good, and the change makes sense. Just one small request below to try to reduce some of the times we need to loop over a list, but otherwise it should be good to go.
scheduler/numa_ce.go
Outdated
| mhz += cs.topology.Cores[i].MHz() | ||
| } | ||
| } | ||
| ids := helper.ConvertSlice(cores, func(id hw.CoreID) uint16 { return uint16(id) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of going back over the cores slice to pull out the ids, could they be collected in the search above? It saves an additional loop over that slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I updated the code.
| sortedTopologyCores := make([]numalib.Core, len(cs.topology.Cores)) | ||
| copy(sortedTopologyCores, cs.topology.Cores) | ||
| slices.SortFunc(sortedTopologyCores, func(a, b numalib.Core) int { return cmp.Compare(a.ID, b.ID) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine; we want to keep that data structure the same due to requirements in the enterprise version.
@allisonlarson Thanks for the review. I pushed a new commit with your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks great. This will get merged after once 1.10.0 GA goes out.
|
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Description
We were using version
1.6.10with a configuration for the reservable cores as the following:reservable_cores = "1-31,33-63"If the available cores are not a continuous set, the core selector might panic when trying to select cores. This happened when we tried to upgrade to version
1.7.7.For example, consider a scenario where the available cores for the selector are the following:
This list contains 46 cores, because cores with IDs 0 and 24 are not included in the list
Before this patch, if we requested 46 cores, the selector would panic trying to access the item with index 46 in
cs.topology.Cores.This patch changes the selector to use the core ID instead when looking for a core inside
cs.topology.Cores. This prevents an out of bounds access that was causing the panic.Note: The patch is straightforward with the change. Perhaps a better long-term solution would be to restructure the
numalib.Topology.Coresfield to be amap[ID]Core. Let me know what you prefer.Another thing to consider, since the reporting of cores has changed in
1.7.7, do you want this PR to be opened only against that tag? I'd then change the commit base.Testing & Reproduction steps
Added tests.
Links
N/A