Apply Zero-3 and LoRA appears empty lora weight [0]

# System Info
accelerate 1.6.0
peft 0.15.0
transformers 4.51.3
deepspeed 0.16.5


#  Information

The official example scripts

My own modified scripts
Tasks

An officially supported task in the examples folder

My own task or dataset (give details below)
Reproduction
```
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
from accelerate import Accelerator
import torch
from torch.utils.data import Dataset, DataLoader


class DummyDataset(Dataset):
    def __init__(self, tokenizer, dummy_text="Hello, world!", num_samples=100):
        self.tokenizer = tokenizer
        self.dummy_text = dummy_text
        self.num_samples = num_samples

    def __len__(self):
        return self.num_samples

    def __getitem__(self, idx):
        encoded = self.tokenizer(self.dummy_text, return_tensors="pt")
        item = {key: val.squeeze(0) for key, val in encoded.items()}
        return item


accelerator = Accelerator()

model_name = "/home/clouduser/jxk/Qwen2.5-1.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

lora_config = LoraConfig(r=8, lora_alpha=32, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM")
model = get_peft_model(model, lora_config)

optimizer = torch.optim.Adam(model.parameters(), lr=5e-5)
dummy_dataset = DummyDataset(tokenizer, dummy_text="Hello, world!", num_samples=100)
dataloader = DataLoader(dummy_dataset, batch_size=4, shuffle=True)


print("++++" * 100)
policy_state_dict = model.state_dict()
for key, value in policy_state_dict.items():
    if "lora_A" in key or "lora_B" in key:
        print(f"{key}: {value.shape}")
print("++++" * 100)
print("====" * 100)
print("====" * 100)
print("====" * 100)
print("====" * 100)

model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)


print("++++" * 100)
policy_state_dict = model.state_dict()
for key, value in policy_state_dict.items():
    if "lora_A" in key or "lora_B" in key:
        print(f"{key}: {value.shape}")
print("++++" * 100)
```


The printed results (lora weight) are:

Before using zero 3:

`base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight: torch.Size([8, 1536]) base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight: torch.Size([1536, 8]) base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight: torch.Size([8, 1536]) base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight: torch.Size([256, 8]) base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: torch.Size([8, 1536]) base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight: torch.Size([1536, 8]) base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight: torch.Size([8, 1536])`

After using zero 3:

`module.base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight: torch.Size([0])`

This is my zero-stage config file:

```
compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  deepspeed_multinode_launcher: standard
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```
My model is :

```
  name: "Qwen/Qwen/Qwen1.5-0.5B-Chat"
  # name: "Qwen/Qwen2.5-7B-Instruct"
  # name: "Qwen/Qwen2.5-32B-Instruct"
  # name: "Qwen/Qwen2.5-14B-Instruct"
  # name: "internlm/internlm2_5-1_8b"
  # name: "meta-llama/Llama-3.1-8B-Instruct"
```

This is my lora config:

```
lora_config:
  r: 8
  lora_alpha: 32
  target_modules:
    - "q_proj"    # qwen
    - "v_proj"    # qwen
  lora_dropout: 0.1
  bias: "none"
  task_type: "CAUSAL_LM"
```

# Expected behavior

After using Deepspeed's lora+zero3, I found that the weight of lora changed to [0]; If I use zero2 without encountering such problems, can you help me?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apply Zero-3 and LoRA appears empty lora weight [0] #969

System Info

Information

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Apply Zero-3 and LoRA appears empty lora weight [0] #969

Description

System Info

Information

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions