Suppose that your data is divided into N parts. Each part of your data is calculated in T seconds. In a single core architecture you expect all operations will be done in N x T seconds. You also hope that all of the works should be done in T times in an N cores machine. However, in parallel computing, there is a communication lag, which is consumed by each single core (Initializing, passing data from main to child, calculations, passing result and finalizing). Now let the communication lag is C seconds and for simplicity, it is constant for all cores. So, in an N cores machine, calculations should be done in
T + N x C
seconds in which the T part is for calculations and N X C part is for total communications. If we compare it to single core machine, the inequality
(N x T) > (T + N x C)
should be satisfied to gain a computation time, at least, for our assumptions. If we simplify the inequality we can get
C < (N x T - T) / N
so, if the constant communication time is not less than the ratio (N x T - T) / N we have no gain to make this computation parallel.
In your example, the time needed for creation, calculation and communication is bigger than the single core computation for function sqrt.