-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Description
I’m excited about the recent introduction of Domino and its impressive TP optimization.
When I was using deepspeed-domino to better overlap comm & comp in TP, I found domino use forward_backward_no_pipelining() in schedules.py. Is that mean I couldn't use domino(tp optimization) and pp together?
Metadata
Metadata
Assignees
Labels
No labels