Maybe there are some benifit below: 1, The code could be simplier. 2, The inference could be faster. 3, The inference can accept multi-tokens in this way. There are some reference codes here. https://github.com/agiwave/Mamba/blob/main/Mamba.py