Skip to content

Conversation

@carlosgaldino
Copy link
Contributor

Description

We were using version 1.6.10 with a configuration for the reservable cores as the following:

reservable_cores = "1-31,33-63"

If the available cores are not a continuous set, the core selector might panic when trying to select cores. This happened when we tried to upgrade to version 1.7.7.

For example, consider a scenario where the available cores for the selector are the following:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]

This list contains 46 cores, because cores with IDs 0 and 24 are not included in the list

Before this patch, if we requested 46 cores, the selector would panic trying to access the item with index 46 in cs.topology.Cores.

This patch changes the selector to use the core ID instead when looking for a core inside cs.topology.Cores. This prevents an out of bounds access that was causing the panic.

Note: The patch is straightforward with the change. Perhaps a better long-term solution would be to restructure the numalib.Topology.Cores field to be a map[ID]Core. Let me know what you prefer.

Another thing to consider, since the reporting of cores has changed in 1.7.7, do you want this PR to be opened only against that tag? I'd then change the commit base.

Testing & Reproduction steps

Added tests.

Links

N/A

If the available cores are not a continuous set, the core selector might
panic when trying to select cores.

For example, consider a scenario where the available cores for the selector are the following:

    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]

This list contains 46 cores, because cores with IDs 0 and 24 are not
included in the list

Before this patch, if we requested 46 cores, the selector would panic
trying to access the item with index 46 in `cs.topology.Cores`.

This patch changes the selector to use the core ID instead when looking
for a core inside `cs.topology.Cores`. This prevents an out of bounds
access that was causing the panic.

Note: The patch is straightforward with the change. Perhaps a better
long-term solution would be to restructure the `numalib.Topology.Cores`
field to be a `map[ID]Core`, but that is a much larger change that is
more difficult to land. Also, the amount of cores in our case is
small—at most 192—so a search won't have any noticeable impact.
@carlosgaldino carlosgaldino requested review from a team as code owners March 11, 2025 11:36
@hashicorp-cla-app
Copy link

hashicorp-cla-app bot commented Mar 11, 2025

CLA assistant check
All committers have signed the CLA.

@hashicorp-cla-app
Copy link

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes

Have you signed the CLA already but the status is still pending? Recheck it.

Comment on lines +33 to +35
sortedTopologyCores := make([]numalib.Core, len(cs.topology.Cores))
copy(sortedTopologyCores, cs.topology.Cores)
slices.SortFunc(sortedTopologyCores, func(a, b numalib.Core) int { return cmp.Compare(a.ID, b.ID) })
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think about leaving it like this, or if it's better to rework how the cores are stored in the struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine; we want to keep that data structure the same due to requirements in the enterprise version.

Copy link
Member

@allisonlarson allisonlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @carlosgaldino! This is looking really good, and the change makes sense. Just one small request below to try to reduce some of the times we need to loop over a list, but otherwise it should be good to go.

mhz += cs.topology.Cores[i].MHz()
}
}
ids := helper.ConvertSlice(cores, func(id hw.CoreID) uint16 { return uint16(id) })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of going back over the cores slice to pull out the ids, could they be collected in the search above? It saves an additional loop over that slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I updated the code.

Comment on lines +33 to +35
sortedTopologyCores := make([]numalib.Core, len(cs.topology.Cores))
copy(sortedTopologyCores, cs.topology.Cores)
slices.SortFunc(sortedTopologyCores, func(a, b numalib.Core) int { return cmp.Compare(a.ID, b.ID) })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine; we want to keep that data structure the same due to requirements in the enterprise version.

@carlosgaldino
Copy link
Contributor Author

carlosgaldino commented Apr 1, 2025

Thanks @carlosgaldino! This is looking really good, and the change makes sense. Just one small request below to try to reduce some of the times we need to loop over a list, but otherwise it should be good to go.

@allisonlarson Thanks for the review. I pushed a new commit with your suggestions.

@allisonlarson allisonlarson added backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.9.x+ent Changes are backported to 1.9.x+ent backport/1.10.x backport to 1.10.x release line labels Apr 4, 2025
Copy link
Member

@allisonlarson allisonlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks great. This will get merged after once 1.10.0 GA goes out.

@allisonlarson allisonlarson merged commit 048c5bc into hashicorp:main Apr 10, 2025
31 of 32 checks passed
@github-actions
Copy link

github-actions bot commented Aug 9, 2025

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.9.x+ent Changes are backported to 1.9.x+ent backport/1.10.x backport to 1.10.x release line

Projects

Development

Successfully merging this pull request may close these issues.

2 participants