It's not uncommon for GPUs to be less interesting on small data sets as compared to large data sets. The reasons for this will vary depending on the specific algorithm. GPUs generally have a higher main memory bandwidth than CPUs and also can usually outperform them for heavy-duty number crunching. But GPUs usually only work well when there is parallelism inherent in the problem, which can be exposed. Taking advantage of this parallelism allows an algorithm to tap into the greater memory bandwidth as well as the higher compute capability.
However, before the GPU can do anything, it's necessary to get the data to the GPU. And this creates a "cost" to the GPU version of the code that will not normally be present in the CPU version.
To be more precise, the GPU will provide a benefit when the reduction in computation time on the GPU (over the CPU) exceeds the cost of the data transfer. I believe that solving a system of linear equations is somewhere between O(n^2) and O(n^3) complexity. For very small n, this computational complexity may not be large enough to offset the cost of data transfer. But clearly as n becomes larger it should. On the other hand your vector operation may only be O(n) complexity. So the benefit scenario will look different.
For the O(n^2) or O(n^3) case, as we move to larger data sets, the "cost" to transfer the data increases as O(n), but the compute requirements for solution increase as O(n^2) (or O(n^3)). Therefore larger data sets should have exponentially larger compute workloads, reducing the effect of the "cost" of the data transfer. An O(n) problem on the other hand, probably won't have this scaling dynamic. The workload increases at the same rate as the "cost" of data transfer.
Also note that if the "cost" of transferring data to the GPU can be hidden by overlapping it with computation work, then the "cost" for the overlapped portion becomes "free", i.e. it does not contribute to the overall solution time.