OpenGL on iOS is probably the wrong way to go. Metal support on iOS would be the better way to go if you're going GPU.
Metal
You could use Apple's support for Metal Compute shaders. I've written high-performance code for my PhD in it. An early experiment I made calculating some fractals using Metal might give you some ideas to start
Ultimately, this question is too broad. What do you intend to use the library for, or how do you intend to use it? Is it a one off multiplication? Have you tested with current libraries and found the performance to be too slow? If so, by how much?
In general, you can run educational or purely informational experiments on performance of algorithm X on CPU vs. GPU vs. specialized hardware, but most often you run up against Amdahl's law and your code vs. a team of experts in the field.
Accelerate
You can also look into the Accelerate framework which offers BLAS.
Apple, according to the WWDC 2014 talk What's new in the Accelerate Framework, has hand tuned the Linear Algebra libraries targeted at their current generation hardware. They aren't just fast, but energy efficient. There are newer talks as well.