Skip to content

killight98/llama2-mp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

llama2 mp example

HSDP example

PP example is based on torch.distributed.pipelining.

Requirements for CUDA

  • pytorch 2.4+ for HSDP example
  • pytorch 2.6+ for PP example.

Requirements for XPU

  • install OneAPI 2025.1

  • build pytorch

# clone torch source code
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git checkout v2.8.0-rc4
git submodule sync
git submodule update --init --recursive
# install build dependence
pip install -r requirements.txt
# build torch
USE_XPU=1 USE_CUDA=0 python setup.py develop
  • bugfix
  1. cannot find xpupti
# disable it at cmake/Dependencies.cmake
@@ -1627,7 +1627,7 @@ if(USE_KINETO)
   if((NOT USE_XPU) OR (NOT XPU_ENABLE_KINETO))
     set(LIBKINETO_NOXPUPTI ON CACHE STRING "" FORCE)
   else()
-    set(LIBKINETO_NOXPUPTI OFF CACHE STRING "")
+    set(LIBKINETO_NOXPUPTI ON CACHE STRING "")
     message(STATUS "Using Kineto with XPUPTI support")
   endif()
  1. collective api issue
# upgrade oneccl version and modify the LD_LIBRARY_PATH before running
export LD_LIBRARY_PATH=<oneapi...>:<built-ccl-package-path> 

Run

HSDP example

  • single machine with mult-cards
# 4 gpu cards
torchrun --nnodes=1 --nproc_per_node=4 fsdp_tp_example.py
  • multi-node
# master node
# 4 gpu cards per node
torchrun --nnodes=2 --nproc_per_node=4 --node_rank=0 --master_addr=<master_ip> --master_port=29500  fsdp_tp_example.py
# other node
torchrun --nnodes=2 --nproc_per_node=4 --node_rank=1 --master_addr=<master_ip> --master_port=29500  fsdp_tp_example.py

# if face connection fail
export NCCL_SOCKET_IFNAME=<eth-port-name>

PP example

# schedule: GPipe, 1F1B, ZBVZeroBubble
torchrun --nnodes=1 --nproc_per_node=4 pp_example.py -s <schedule> [--skip-profile] [--capture]

TP example

torchrun --nnodes=1 --nproc_per_node=4 tp_example.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages