Skip to content

Conversation

@mare5x
Copy link

@mare5x mare5x commented May 1, 2024

Added an example that extends simple with different token healing strategies. Check the added README for examples.
Token healing works by chopping off some tokens from the tokenized prompt and then constraining the decoding to match the bytes of the removed tokens.

Currently, I am just using for loops for prefix searching, but performance could potentially be improved with a prefix tree (+ caching mentioned in https://arxiv.org/abs/2403.08688).

To finish #5765, we still need to include token healing into server. I think the approach is to extend llama_sampling_sample and modify the initial input tokens?

@teleprint-me
Copy link
Contributor

This is awesome! <3

@mofosyne mofosyne added examples Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 9, 2024
@mare5x
Copy link
Author

mare5x commented May 9, 2024

Adding to main in #7187. Will try to add to server later.

@mare5x mare5x closed this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants