I'm trying to code a parallel version of a simple algorithm that takes a point and a list of point and find which is the point of the list closer to the first one, to compare execution times with the serial version. The problem is that running the parallel version needs more than 1 minute, while the serial version need around 1 seconds.
To be sure that the parallelism effect is noticeable I'm testing the code using a list of around 12 millions of points.
My cpu details:
- Model name: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
- CPU(s): 4
Here are the two versions:
Common part:
type Point struct {
X float64
Y float64
}
func dist(p, q Point) float64 {
return math.Sqrt(math.Pow(p.X-q.X,2)+math.Pow(p.Y-q.Y,2))
}
Sequential function:
func s_argmin(p Point, points_list []Point, i,j int)(int){
best := 0
d := dist(p, points_list[0])
var new_d float64
for k:=i;k<j+1;k++{
new_d = dist(p, points_list[k])
if new_d < d{
d = new_d
best = k
}
}
return best
}
Parallel function:
func p_argmin(p Point, points_list []Point, i,j int)(int){
if i==j{
return i
}else{
mid := int((i+j)/2)
var argmin1, argmin2 int
c1 := make(chan int)
c2 := make(chan int)
go func(){
c1 <- p_argmin(p, points_list, i, mid)
}()
go func(){
c2 <- p_argmin(p, points_list, mid+1, j)
}()
argmin1 = <- c1
argmin2 = <- c2
close(c1)
close(c2)
if dist(p,points_list[argmin1])<dist(p,points_list[argmin2]){
return argmin1
}else{
return argmin2
}
}
}
I also tried to limit parallelism, with a optimized function that execute the parallel version of the function only when the input size (j-i) is greater than a value, but the serial version is always the faster one.
How can improve the result of the parallel version?