Skip to content

Conversation

@alex--m
Copy link
Contributor

@alex--m alex--m commented Oct 23, 2021

No description provided.

@swx-jenkins4
Copy link

Can one of the admins verify this patch?

@shamisp
Copy link
Contributor

shamisp commented Oct 23, 2021

ok to test

@shamisp
Copy link
Contributor

shamisp commented Oct 23, 2021

@alex--m any update on CLA ?

@alex--m
Copy link
Contributor Author

alex--m commented Oct 23, 2021

@shamisp afraid not, and I don't expect it'll be resolved soon. I'm posting stuff I plan to upstream once it does - I just don't seem to have permissions to put the "CLA missing" label...

@alex--m alex--m changed the title UCS: adding multi-dimentional hash tables UCS: adding multi-dimensional hash tables Oct 23, 2021
@alex--m alex--m force-pushed the topic/mdht branch 4 times, most recently from 30b3604 to 89094bf Compare November 1, 2021 00:49
@yosefe
Copy link
Contributor

yosefe commented Nov 14, 2021

how is it different from using regular khash with a custom key type that contains multiple values?

@alex--m
Copy link
Contributor Author

alex--m commented Nov 14, 2021

how is it different from using regular khash with a custom key type that contains multiple values?

This is the result of some research, and the paper is still in progress, but the gist is the difference in iteration. A multi-dimensional hash-table would only access the keys matching the query vector in every dimension (Figure 1), whereas this implementation has a special way to iterate over neighboring vectors (Figure 2) in order to locate the nearest neighbor. I plan to use this as an advanced form of caching (in a separate commit).

khash
ndim_khash

@yosefe
Copy link
Contributor

yosefe commented Nov 14, 2021

IIUC, the expected lookup performance of the special multi-dim implementation should be better than default khash with a vector key?

@alex--m
Copy link
Contributor Author

alex--m commented Nov 14, 2021

IIUC, the expected lookup performance of the special multi-dim implementation should be better than default khash with a vector key?

No, I'm afraid the lookup is not faster, just different (and in fact typically slower). For example, if both dimensions fall into bin #1 in the figures above, the default khash with vector keys will check V3 and V4, whereas this special 2D khash will check V3 - V8. The only advantage (and purpose) of this special multi-dimensional khash-based data-structure is a fast nearest-neighbor lookup.

@shamisp
Copy link
Contributor

shamisp commented Nov 14, 2021

@alex--m Can you give a bit more details how it will be used ? AKA how we will benefit from nearest neighbor lookup speedup.

@alex--m
Copy link
Contributor Author

alex--m commented Nov 14, 2021

@alex--m Can you give a bit more details how it will be used ? AKA how we will benefit from nearest neighbor lookup speedup.

Sure. The basic Idea is this: caches tend to be all-or-nothing matches, so if you're looking up, say, a past request, you only get identical past requests. In such case, the nearest-neighbor lookup allow you to find a "similar" past request and modify it, rather than creating a brand new request (which is presumably more expensive, otherwise this makes little sense). The speedup primarily comes from (a) re-using similar past objects rather than creating new ones, and secondarily from the reduced memory consumption which is the result of this recycling. Now this begs the question: what objects are so hard to create, justifying all this? one answer is QPs, but there are others.

@shamisp
Copy link
Contributor

shamisp commented Nov 14, 2021

@alex--m Sounds like some cache approximation. Can you please give us a bit more specific what is the follow up patch and how it will be useful for UCX.

@alex--m alex--m force-pushed the topic/mdht branch 5 times, most recently from 541177f to efc9355 Compare February 19, 2022 15:17
@alex--m alex--m force-pushed the topic/mdht branch 2 times, most recently from d4186f5 to 6f1bc0b Compare May 19, 2022 15:35
@alex--m alex--m force-pushed the topic/mdht branch 9 times, most recently from 7007bb5 to bcc164b Compare June 14, 2022 07:18
@alex--m alex--m force-pushed the topic/mdht branch 4 times, most recently from 62be79b to 4678f39 Compare May 5, 2024 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants