Skip to content

Zipformer2 with CTC is hard to train #1352

@joazoa

Description

@joazoa

I am playing a bit with the CTC option in the zipformer 2, with the largest model from the documentation.
It trained well for a first dataset but when I try another dataset, the training stops.

I have tried reducing the LR, increasing the warmup period, disabling the FP16. changing the max duration, removing any augmentations in lhotse, reducing maximum file duration, removing specaugment and musan and changing the worldsize.

The same dataset works fine with zipformer2 tranducer, does not work for zipformer2 with only CTC.
Works fine for zipformer-ctc as well.

Do you have any suggestions on what I could try next ?

2023-10-29 23:40:46,833 INFO [train.py:1034] (2/8) Epoch 1, batch 0, loss[loss=8.228, simple_loss=7.432, pruned_loss=7.424, ctc_loss=5.236, over 20568.00 frames. ], tot_loss[loss=8.228, simple_loss=7.432, pruned_loss=7.424, ctc_loss=5.236, over 20568.00 frames. ], batch size: 95, lr: 2.25e-02, grad_scale: 1.0
2023-10-29 23:40:46,833 INFO [train.py:1057] (2/8) Computing validation loss
2023-10-29 23:40:54,294 INFO [train.py:1066] (2/8) Epoch 1, validation: loss=inf, simple_loss=7.453, pruned_loss=7.416, ctc_loss=inf, over 901281.00 frames.
2023-10-29 23:40:54,295 INFO [train.py:1067] (2/8) Maximum memory allocated so far is 22148MB
2023-10-29 23:40:59,516 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.26 vs. limit=5.0
2023-10-29 23:41:11,380 INFO [scaling.py:199] (2/8) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=0.0, ans=0.9
2023-10-29 23:41:16,030 INFO [scaling.py:199] (2/8) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=106.66666666666667, ans=0.2016
2023-10-29 23:41:26,737 INFO [scaling.py:199] (2/8) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=106.66666666666667, ans=0.29893333333333333
2023-10-29 23:41:40,934 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=7.54
2023-10-29 23:41:50,438 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=315.95 vs. limit=7.58
2023-10-29 23:42:01,307 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=17.16 vs. limit=4.085333333333334
2023-10-29 23:42:09,032 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=37.41 vs. limit=7.62
2023-10-29 23:42:19,442 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=165.76 vs. limit=5.08
2023-10-29 23:42:29,662 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=31.92 vs. limit=7.62
2023-10-29 23:42:40,221 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=433.28 vs. limit=7.66
2023-10-29 23:42:45,693 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=595.94 vs. limit=7.82
2023-10-29 23:43:02,777 INFO [train.py:1034] (2/8) Epoch 1, batch 50, loss[loss=3.141, simple_loss=2.879, pruned_loss=2.03, ctc_loss=4.85, over 19767.00 frames. ], tot_loss[loss=inf, simple_loss=4.825, pruned_loss=4.687, ctc_loss=inf, over 918170.33 frames. ], batch size: 274, lr: 2.48e-02, grad_scale: 4.76837158203125e-07
2023-10-29 23:43:04,688 INFO [scaling.py:979] (2/8) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=191.49 vs. limit=4.1066666666666665
2023-10-29 23:43:10,266 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=48.81 vs. limit=7.7
2023-10-29 23:43:13,223 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.29 vs. limit=5.133333333333334
2023-10-29 23:43:23,039 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=31.32 vs. limit=7.9
2023-10-29 23:43:41,183 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=4.256
2023-10-29 23:43:41,680 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=90.25 vs. limit=7.98
2023-10-29 23:43:43,362 INFO [scaling.py:199] (2/8) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=640.0, ans=0.2436
2023-10-29 23:43:46,978 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=236.17 vs. limit=7.74
2023-10-29 23:43:53,663 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=22.56 vs. limit=5.32
2023-10-29 23:44:09,088 INFO [scaling.py:199] (2/8) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.whiten.whitening_limit, batch_count=746.6666666666666, ans=4.298666666666667
2023-10-29 23:44:13,685 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=355.68 vs. limit=8.06
2023-10-29 23:44:21,592 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=175.08 vs. limit=7.82
2023-10-29 23:44:30,091 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=24.64 vs. limit=7.82
2023-10-29 23:44:30,502 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=124.32 vs. limit=8.14
2023-10-29 23:44:44,328 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=141.67 vs. limit=7.82
2023-10-29 23:44:49,658 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=48.55 vs. limit=5.24
2023-10-29 23:44:51,541 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.75 vs. limit=5.24
2023-10-29 23:44:51,565 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=39.07 vs. limit=7.86
2023-10-29 23:44:54,528 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=80.86 vs. limit=7.86
2023-10-29 23:45:02,881 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=4.384
2023-10-29 23:45:06,051 INFO [scaling.py:979] (2/8) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=15.26 vs. limit=4.384
2023-10-29 23:45:17,726 INFO [checkpoint.py:75] (2/8) Saving checkpoint to zipformer/exp-large-ctc-transducer/bad-model-first-warning-2.pt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions