I am using ZStd with armv7 32bit platform to decompress, but 128M data would require about 2 minutes to finish. But on armv8 64bit, it is only require about 2seconds, so what result armv7 32bit performance so poor.
I have find zstd mem.h have said using unalign memory access can speed the decompress, but i have setting sctlr.a bit to 0 , it is still no effect