Skip to content

Conversation

@chetandhembre
Copy link

What does this PR do?

Hey folks..
I am not expecting this pr to be merged as i might be following process to get pr merged.
Just wanted to notify some code changes I made in order to run tiny_llama train script, i also created new script to resume from checkpoint (script is not included in pr).. while running that script i also faced some issue.. I added fix to those issues also.

you can feel free to close PR if it is not as per guideline, just please provide feedback if my solution is at least valid.
Thank you!
Cheers.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guidelines?
  • Did you write any new necessary tests?
  • Did you log the throughput and loss you get to ensure the PR works as expected in actual training?
  • Did you log the memory usage? you can use this tool to understand the memory usage breakdown in nanotron.
  • If you modified anything related to checkpoints, did you verify that saving and reloading checkpoints still works correctly?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@chetandhembre
Copy link
Author

chetandhembre commented Jul 2, 2025

In order to run script on 4000 ada gpu i also need to do following things, to get environment setup properly

uv pip install torch --index-url https://download.pytorch.org/whl/cu128
uv pip install wheel
uv pip install ninja triton "flash-attn>=2.5.0" --no-build-isolation
uv pip install --no-build-isolation git+https://github.com/fanshiqing/grouped_gemm@main
uv pip install psutil

@SulRash
Copy link

SulRash commented Jul 3, 2025

I did a similar fix in the pr #377 already 👀

@chetandhembre
Copy link
Author

@SulRash sorry your pr had lot of things in it.. so i did not check whether it solves my problem.. i am ok to close this pr for your pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants