Skip to content

Conversation

@zhu-han
Copy link
Contributor

@zhu-han zhu-han commented Aug 1, 2024

This PR adds a recipe of training Zipformer with Adam optimizer. The goal is to help people integrate Zipformer encoder into their own models trained with Adam optimizer.

To make Zipformer compatible with Adam, there are several changes compared with the original Zipformer recipe:

  1. Replace ScaledAdam with Adam,
  2. Remove balancer and whitener modules,
  3. Replace all ScaledLinear with nn.Linear,
  4. Replace Eden with Noam learning rate scheduler,
  5. Replace SwooshR and SwooshL with Swish activation function,
  6. Add an additional BiasNorm in each module (feedforward, attention and convolution),
  7. Multiply the attention score with the scaling factor d**-0.5.

The results are as follows:

  • normal-scaled model, number of model parameters: 65595219, i.e., 65.60 M
decoding method test-clean test-other comment
greedy_search 2.35 5.53 --epoch 70 --avg 30
modified_beam_search 2.29 5.48 --epoch 70 --avg 30
fast_beam_search 2.31 5.52 --epoch 70 --avg 30
  • large-scaled model, number of model parameters: 148514478, i.e., 148.5 M
decoding method test-clean test-other comment
greedy_search 2.27 5.25 --epoch 70 --avg 20
modified_beam_search 2.23 5.17 --epoch 70 --avg 20
fast_beam_search 2.24 5.2 --epoch 70 --avg 20

Note that Zipformer with ScaledAdam performs better than the Zipformer with Adam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant