FSDP oneshot #1939

Satrat · 2024-01-04T23:24:33Z

This PR updates the one-shot modifiers SparseGPT, Wanda and SmoothQuant and Quantization to be compatible with FSDP. This enables us to run alternating one-shot/finetuning flows with FSDP

** NOTE: ** #1912 should be merged first, it covers the initial alternating flow implementation.

Summary of Changes

Remove any references of specific devices from the one-shot modifiers, device is now handled by SparseCausalLM, and defaults to "auto" for splitting the model across multiple GPUs (this isn't FSDP related, we can split the model even outside of FSDP) "auto" actually isn't compatible with quantization :( so keeping the default as "cuda:0", but you can pass "auto" through the CLI for a non-quantized oneshot
Refactored the SparseGPT class to be a module wrapper, so that we can update weights using module.apply as required for FSDP compatibility
Refactored Wanda in the same way, also cleaned up the code sharing between SparseGPT and Wanda(@rahul-tuli would like your input specifically on this)
Bug fixes related to quantizing FSDP models

…into sparse_auto_recipe

src/sparseml/transformers/sparsification/obcq/example.yaml

Satrat · 2024-01-09T22:56:17Z

Remove any references of specific devices from the one-shot modifiers, device is now handled by SparseCausalLM, and defaults to "auto" for splitting the model across multiple GPUs (this isn't FSDP related, we can split the model even outside of FSDP)

It doesn't seem like device defaults to "auto" if that is an intended change. Current obcq.py arg:
    parser.add_argument("--device", type=str, default="cuda:0")

See updated PR comment :( device_map="auto" doesn't seem to be compatible with quantization so I'm leaving it off the default. It can still be specified on the CLI for non-quantized one-shot

…_fsdp

Sara Adkins added 30 commits November 16, 2023 16:12

initial recipe re-loading

d5abe8e

Merge branch 'main' into sparse_auto_recipe

ec0e180

loading for input recipe

2d7b5b7

Merge branch 'main' into sparse_auto_recipe

2cc9e16

persist structure across recipe loads

356bd81

clean up fn names

1b67b6f

Merge branch 'main' into sparse_auto_recipe

f06ed8a

clean up duplicated code

ab5a464

delete extra file

11f4efe

unit tests

7e960a3

fix failing test

ebb5407

quantization edge cases

6a394d7

quant tests

d7974bf

Merge branch 'main' into sparse_auto_recipe

4b9014d

fixes for stage name clashes

701ab2c

Merge branch 'sparse_auto_recipe' of github.com:neuralmagic/sparseml …

5812488

…into sparse_auto_recipe

clean up documentation

21473aa

setup StageRunner class

485501b

running one_shot from text_gen script

2d536a3

cleanup helper fns

a4406ae

precision support

4576a80

formatting

27467e3

Merge branch 'main' into alternate_flows

10a0fed

WIP for alternating

7c754e0

fixing device issue

0eb06bf

Merge branch 'sparse_auto_recipe' into alternating_flow_pt2

f45326d

Merge branch 'main' into sparse_auto_recipe

e46dd96

MVP for alternating flows

d308987

add apply flag during finalization as well

fe9af83

clarity comments

5f6e854

fix circular import

afc9a15

Satrat requested review from bfineran and rahul-tuli January 9, 2024 19:14

mgoin reviewed Jan 9, 2024

View reviewed changes

src/sparseml/transformers/sparsification/obcq/example.yaml Outdated Show resolved Hide resolved

bfineran previously approved these changes Jan 9, 2024

View reviewed changes

unmodify example

3a461ab

Satrat dismissed bfineran’s stale review via 3a461ab January 9, 2024 22:53

Satrat requested review from bfineran and mgoin January 9, 2024 22:56

Sara Adkins added 8 commits January 10, 2024 14:56

fix typo!

78fd7a2

Merge branch 'main' into sgpt_fsdp

b17e90d

setup FSDP for when starting from oneshot

291079d

Merge branch 'sgpt_fsdp' of github.com:neuralmagic/sparseml into sgpt…

ee61df2

…_fsdp

update setup and readme

229e5f7

fix CLI issue, update README

e250572

POC for sequential FSDP OBCQ (#1947)

80983e5

Merge branch 'main' into sgpt_fsdp

2c9fbfc

bfineran previously approved these changes Jan 10, 2024

View reviewed changes

fix GHA line lost in merge

245c8de

bfineran dismissed their stale review via 245c8de January 10, 2024 21:42

Sara Adkins added 6 commits January 10, 2024 22:22

fix calib loading

497b2f7

Merge branch 'fix_calib' into sgpt_fsdp

ce833db

fix dependencies

a65cf35

reverting OBCQ merged changes for now

cb19abc

restore SparseCausalModel for now

80e1698

add progress bar for calibration forward pass (#1950)

a7cfd5f

bfineran approved these changes Jan 11, 2024

View reviewed changes

bfineran merged commit 5007b8c into main Jan 11, 2024

bfineran deleted the sgpt_fsdp branch January 11, 2024 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FSDP oneshot #1939

FSDP oneshot #1939

Uh oh!

Satrat commented Jan 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

Satrat commented Jan 9, 2024

Uh oh!

Uh oh!

FSDP oneshot #1939

FSDP oneshot #1939

Uh oh!

Conversation

Satrat commented Jan 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Satrat commented Jan 9, 2024

Uh oh!

Uh oh!

Satrat commented Jan 4, 2024 •

edited

Loading