I have an ARM based platform with a Linux OS. Even though its gcc-based toolchain supports both hardfp and softfp, the vendor recommends using softfp and the platform is shipped with a set of standard and platform-related libraries which have only softfp version.
I'm making a computation-intensive (NEON) AI code based on OpenCV and tensorflow lite. Following the vendor guide, I have built these with softfp option. However, I have a feeling that my code is underperformed compared to other somewhat alike hardfp platforms.
Does the code performance depend on softfp/hardfp setting? Do I understand it right that all .o and .a files the compiler makes to build my program are also using softfp convention, which is less effective? If it does, are there any tricky ways to use hardfp calling convention internally but softfp for external libraries?