There is ARM software optimization guide (e.g., https://developer.arm.com/documentation/swog309707/latest for neoverse n1).
This guide doesn't seem to contain the latency and throughput for Neon or SVE. Is there a separate guide for NEON or SVE (e.g., the instruction latency and throughput for INSR (SIMD&FP scalar)
instruction)?
A pointer would be very helpful!