Currently `std_detect` implements `i8mm` feature on aarch64: [here](https://github.com/rust-lang/stdarch/blob/master/crates/std_detect/src/detect/arch/aarch64.rs#L122) But this feature does not seem to be available: https://rust.godbolt.org/z/8GbKW5ef4. This led to the `vmmla` and `vusmmla` instructions not being implemented. As in #1230