Is there any simple way-library to efficient (max possible speed) implement linear algebra on an ARM CortexA9 dual core using Xilinx SDK?
I am using a zybo z7 developememt board with a dual core Arm proccesor and i want to implement a simple neural network with one convolution layer followed by a dense one, on Xilinx SDK. Specificaly, to tranfer a python numpy based model on Arm. I read some manuals for ARM and SIMD library but i don't want to dive so deep.
An easy way for me is to use a library and do the multiplication/dot product/convolve etc by itself (fast) like numpy in python and avoid pure for...loop syntax. An example would be nice!
Thank for your time