I'm working with Euclidean Distance with a pair of dataset. First of all, my data.
centers <- data.frame(x_ce = c(300,180,450,500),
y_ce = c(23,15,10,20),
center = c('a','b','c','d'))
points <- data.frame(point = c('p1','p2','p3','p4'),
x_p = c(160,600,400,245),
y_p = c(7,23,56,12))
My goal is to find, for each point in points
, the smallest distance from all the center in centers
, and append the center name to the points
dataset (clearly the smallest one's), and make this procedure automatic.
So I started with the base:
#Euclidean distance
sqrt(sum((x-y)^2))
The fact that I have in my mind how it should work, but I cannot manage how to make it automatic.
- choose one row of
points
, and all the rows ofcenters
- calculate the Euclidean Distance between the row and each row of
centers
- choose the smallest distance
- attach the label of the smallest distance
- repeat for the second row ... till the end of
points
So I managed to do it manually, to have all the steps to make it automatic:
# 1.
x = (points[1,2:3]) # select the first of points
y1 = (centers[1,1:2]) # select the first center
y2 = (centers[2,1:2]) # select the second center
y3 = (centers[3,1:2]) # select the third center
y4 = (centers[4,1:2]) # select the fourth center
# 2.
# then the distances
distances <- data.frame(distance = c(
sqrt(sum((x-y1)^2)),
sqrt(sum((x-y2)^2)),
sqrt(sum((x-y3)^2)),
sqrt(sum((x-y4)^2))),
center = centers$center
)
# 3.
# then I choose the row with the smallest distance
d <- distances[which(distances$distance==min(distances$distance)),]
# 4.
# last, I put the label near the point
cbind(points[1,],d)
# 5.
# then I restart for the second point
The problem is that I cannot manage it automatically. have you got any idea to make this procedure automatic for each points of points
?
Furthermore, am I reinventing the wheel, i.e. does it exist a faster procedure (as a function) that I don't know?