Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Conversation

dsikka
Copy link
Contributor

@dsikka dsikka commented Mar 14, 2024

Tested code:

import deepsparse

MODEL_ID = "hf:nm-testing/llama2-7B-sparse70-retrained-ultrachat200k-pruned70-smoothquant-ds"
#MODEL_ID = "zoo:mistral-7b-ultrachat200k_mistral_pretrain-pruned40_quantized"

pipe = deepsparse.Pipeline.create(
    task="text-generation",
    model_path=MODEL_ID,
    sequence_length=512,
    prompt_sequence_length=16,
)

message = "Once upon a time"

conversation = []
conversation.append({"role": "user", "content": message})
formatted_conversation = pipe.tokenizer.apply_chat_template(
    conversation, tokenize=False, add_generation_prompt=True
)

generation_config = {
    "max_new_tokens": 100,
}

inference = pipe(
    sequences=formatted_conversation,
    generation_config=generation_config,
    streaming=True,
)

for token in inference:
    print(token.generations[0].text, end="")

Output:


There was a time when the world was a different place. A time when people were more accepting of each other and didn't judge based on race, religion, or gender. A time when kindness and compassion were the norm, and hate and prejudice were unheard of.

But then something changed. The world became more divided, and people started to see each other through a

@dsikka dsikka requested a review from mgoin March 14, 2024 21:39
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering my questions, nice implementation

@mgoin mgoin merged commit 9bac61e into main Mar 14, 2024
@mgoin mgoin deleted the fix_llama_tokenizer branch March 14, 2024 21:51
dhuangnm pushed a commit that referenced this pull request Mar 14, 2024
* add llama tokenizer fix

* fix generated string

* only run for streaming

* add TODO

---------

Co-authored-by: Dipika Sikka <[email protected]>
Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test would be nice to have, but I guess the priority is to land this asap.

dhuangnm added a commit that referenced this pull request Mar 18, 2024
* [TextGeneration] Fix llama tokenizer (#1635)

* add llama tokenizer fix

* fix generated string

* only run for streaming

* add TODO

---------

Co-authored-by: Dipika Sikka <[email protected]>

* Retire `flaky` in favour of `pytest-rerunfailures` (#1628)

* pick up another fix and bump up version to 1.7.1

---------

Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: dbogunowicz <[email protected]>
Co-authored-by: dhuang <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants