[TextGeneration] Fix llama tokenizer #1635

dsikka · 2024-03-14T20:19:18Z

Tested code:

import deepsparse

MODEL_ID = "hf:nm-testing/llama2-7B-sparse70-retrained-ultrachat200k-pruned70-smoothquant-ds"
#MODEL_ID = "zoo:mistral-7b-ultrachat200k_mistral_pretrain-pruned40_quantized"

pipe = deepsparse.Pipeline.create(
    task="text-generation",
    model_path=MODEL_ID,
    sequence_length=512,
    prompt_sequence_length=16,
)

message = "Once upon a time"

conversation = []
conversation.append({"role": "user", "content": message})
formatted_conversation = pipe.tokenizer.apply_chat_template(
    conversation, tokenize=False, add_generation_prompt=True
)

generation_config = {
    "max_new_tokens": 100,
}

inference = pipe(
    sequences=formatted_conversation,
    generation_config=generation_config,
    streaming=True,
)

for token in inference:
    print(token.generations[0].text, end="")

Output:


There was a time when the world was a different place. A time when people were more accepting of each other and didn't judge based on race, religion, or gender. A time when kindness and compassion were the norm, and hate and prejudice were unheard of.

But then something changed. The world became more divided, and people started to see each other through a

src/deepsparse/transformers/pipelines/text_generation/prep_for_generation.py

src/deepsparse/transformers/pipelines/text_generation/process_outputs.py

mgoin

Thanks for answering my questions, nice implementation

* add llama tokenizer fix * fix generated string * only run for streaming * add TODO --------- Co-authored-by: Dipika Sikka <[email protected]>

dbogunowicz

A test would be nice to have, but I guess the priority is to land this asap.

src/deepsparse/transformers/pipelines/text_generation/process_outputs.py

* [TextGeneration] Fix llama tokenizer (#1635) * add llama tokenizer fix * fix generated string * only run for streaming * add TODO --------- Co-authored-by: Dipika Sikka <[email protected]> * Retire `flaky` in favour of `pytest-rerunfailures` (#1628) * pick up another fix and bump up version to 1.7.1 --------- Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: dbogunowicz <[email protected]> Co-authored-by: dhuang <[email protected]>

dsikka and others added 2 commits March 14, 2024 19:53

add llama tokenizer fix

fd3b0d8

fix generated string

210c49c

dsikka requested review from mgoin, bfineran and dbogunowicz March 14, 2024 20:19

Dipika Sikka added 2 commits March 14, 2024 20:26

only run for streaming

15eb5ca

add TODO

791d7ad

bfineran approved these changes Mar 14, 2024

View reviewed changes

mgoin reviewed Mar 14, 2024

View reviewed changes

src/deepsparse/transformers/pipelines/text_generation/prep_for_generation.py Show resolved Hide resolved

src/deepsparse/transformers/pipelines/text_generation/process_outputs.py Show resolved Hide resolved

dsikka requested a review from mgoin March 14, 2024 21:39

mgoin approved these changes Mar 14, 2024

View reviewed changes

mgoin merged commit 9bac61e into main Mar 14, 2024

mgoin deleted the fix_llama_tokenizer branch March 14, 2024 21:51

dhuangnm pushed a commit that referenced this pull request Mar 14, 2024

[TextGeneration] Fix llama tokenizer (#1635)

f2a06c5

* add llama tokenizer fix * fix generated string * only run for streaming * add TODO --------- Co-authored-by: Dipika Sikka <[email protected]>

dbogunowicz reviewed Mar 15, 2024

View reviewed changes

src/deepsparse/transformers/pipelines/text_generation/process_outputs.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TextGeneration] Fix llama tokenizer #1635

[TextGeneration] Fix llama tokenizer #1635

Uh oh!

dsikka commented Mar 14, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

mgoin left a comment

Uh oh!

dbogunowicz left a comment

Uh oh!

Uh oh!

Uh oh!

[TextGeneration] Fix llama tokenizer #1635

[TextGeneration] Fix llama tokenizer #1635

Uh oh!

Conversation

dsikka commented Mar 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

dbogunowicz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dsikka commented Mar 14, 2024 •

edited

Loading