-
Notifications
You must be signed in to change notification settings - Fork 489
UCS: adding multi-dimensional hash tables #7578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Can one of the admins verify this patch? |
|
ok to test |
|
@alex--m any update on CLA ? |
|
@shamisp afraid not, and I don't expect it'll be resolved soon. I'm posting stuff I plan to upstream once it does - I just don't seem to have permissions to put the "CLA missing" label... |
30b3604 to
89094bf
Compare
|
how is it different from using regular khash with a custom key type that contains multiple values? |
This is the result of some research, and the paper is still in progress, but the gist is the difference in iteration. A multi-dimensional hash-table would only access the keys matching the query vector in every dimension (Figure 1), whereas this implementation has a special way to iterate over neighboring vectors (Figure 2) in order to locate the nearest neighbor. I plan to use this as an advanced form of caching (in a separate commit). |
|
IIUC, the expected lookup performance of the special multi-dim implementation should be better than default khash with a vector key? |
No, I'm afraid the lookup is not faster, just different (and in fact typically slower). For example, if both dimensions fall into bin #1 in the figures above, the default khash with vector keys will check V3 and V4, whereas this special 2D khash will check V3 - V8. The only advantage (and purpose) of this special multi-dimensional khash-based data-structure is a fast nearest-neighbor lookup. |
|
@alex--m Can you give a bit more details how it will be used ? AKA how we will benefit from nearest neighbor lookup speedup. |
Sure. The basic Idea is this: caches tend to be all-or-nothing matches, so if you're looking up, say, a past request, you only get identical past requests. In such case, the nearest-neighbor lookup allow you to find a "similar" past request and modify it, rather than creating a brand new request (which is presumably more expensive, otherwise this makes little sense). The speedup primarily comes from (a) re-using similar past objects rather than creating new ones, and secondarily from the reduced memory consumption which is the result of this recycling. Now this begs the question: what objects are so hard to create, justifying all this? one answer is QPs, but there are others. |
|
@alex--m Sounds like some cache approximation. Can you please give us a bit more specific what is the follow up patch and how it will be useful for UCX. |
541177f to
efc9355
Compare
d4186f5 to
6f1bc0b
Compare
7007bb5 to
bcc164b
Compare
Signed-off-by: Alex Margolin <[email protected]>
62be79b to
4678f39
Compare
…est neighbor search (based on khash)


No description provided.