Skip to content

Conversation

Muhtasham
Copy link

@Muhtasham Muhtasham commented Aug 25, 2024

Closes #35 #41 #58

@Muhtasham Muhtasham closed this Aug 25, 2024
@Muhtasham Muhtasham reopened this Aug 25, 2024
@Muhtasham
Copy link
Author

@guoday

@kael53
Copy link

kael53 commented Jan 29, 2025

I need this to fine-tune

@BehsadRiemer
Copy link

BehsadRiemer commented Jan 30, 2025

I remember you telling me about Deepseek in May, props to you @Muhtasham 🏃‍♂️

@adamreed90
Copy link

@Muhtasham Should the prompt being built in build_instruction_prompt not match the example in the README.md:

<|begin▁of▁sentence|>User: {user_message_1}

Assistant: {assistant_message_1}<|end▁of▁sentence|>User: {user_message_2}

Assistant:

Apologies if I misunderstand.

@Muhtasham
Copy link
Author

@adamreed90 Yes this PR is specifically for instruction fine-tuning, so the prompt format in build_instruction_prompt is intentionally different from the chat-based format:

As I pointed in the new README.MD, for training data preparation, please follow the Sample Dataset Format.

If you’re bringing a dataset in a different format (such as chat-based), it would require modification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to build a fine-tuning dataset for code completion?

4 participants