1

I have xy coordinates of points and I want to make use distance for averaging points. My data is named qq and I obtain the distance matrix using dist function

qq
       X      Y
2 4237.5 4411.5
3 4326.5 4444.5
4 4382.0 4418.0
5 4204.0 4487.5
6 4338.5 4515.0

mydist = as.matrix(dist(qq))

          2         3         4        5         6
2   0.00000  94.92102 144.64612  83.0557 144.61414
3  94.92102   0.00000  61.50203 129.8278  71.51398
4 144.64612  61.50203   0.00000 191.0870 106.30734
5  83.05570 129.82777 191.08702   0.0000 137.28256
6 144.61414  71.51398 106.30734 137.2826   0.00000

What I want to do is to average points that are closer that a certain threshold, for this example we could use 80. The only pairwise distances that fall below that limit are 3-4 and 3-6. The question is how to go back to the original matrix and average xy coordinates to make the 3-4 pair one point and 3-6 pair another one (discarding former points 3,4 and 6)

here's the dput of my data.frame

dput(qq)
structure(list(X = c(4237.5, 4326.5, 4382, 4204, 4338.5), Y = c(4411.5, 
4444.5, 4418, 4487.5, 4515)), .Names = c("X", "Y"), row.names = 2:6, class = "data.frame")

UPDATE

Using some of the provided with modifications code I get the 2 points I need to replace in the 3-4 place and 3-6 place. This means my point 3 and 4 and 6 will have to disappear from qq and this two points should be appended to it

pairs <- which(as.matrix(dist(qq)) < 80 & upper.tri(as.matrix(dist(qq))), arr.ind = T)
t(apply(pairs,1,function(i) apply(qq[i,],2,mean)))
        X       Y
3 4354.25 4431.25
3 4332.50 4479.75
Matias Andina
  • 4,029
  • 4
  • 26
  • 58

2 Answers2

1

I think this should do it for you, if I understand the problem correctly.

pairs <- which(as.matrix(y) > 140 & upper.tri(as.matrix(y)), arr.ind = T)
result <- apply(pairs,1,function(i) apply(qq[i,],2,mean))

#optionally, I think this is the form you will want it in.
result <- data.frame(t(result))

It will a matrix of a similar structure to qq containing the averages of points which are "far" away from each other determined by thresh.

UPDATE

qq <- qq[-unique(c(pairs)),]
qq <- rbind(qq,result)
Adam
  • 648
  • 6
  • 18
  • It's not doing what I need. dim of your apply call is 2,16 and my expected output should be 4,2 – Matias Andina Jul 27 '16 at 14:25
  • check my edits, there was a casting issue in the result of the dist matrix. If you are expecting to receive 3-6 and 6-3 (same result) as two different numbers, then you should remove the `& upper.tri(...)` – Adam Jul 27 '16 at 14:48
  • I think we're getting closer but not quite there, check my update – Matias Andina Jul 27 '16 at 14:56
0

Ok so I was able to merge strategies and solve the issue but not in a fancy way

# Search pairs less than threshold
pairs <- which(as.matrix(dist(qq)) < 80 & upper.tri(as.matrix(dist(qq))), arr.ind = T)

# Get the row numbers for subsetting the original matrix 
indx=unique(c(pairs[,1],pairs[,2]))

# Get result dataframe
out = data.frame(rbind(qq[-indx,],t(apply(pairs,1,function(i) apply(qq[i,],2,mean)))),row.names=NULL)

dim(out)
[1] 4 2

out
        X       Y
1 4237.50 4411.50
2 4204.00 4487.50
3 4354.25 4431.25
4 4332.50 4479.75

The row.names get dropped because they mean nothing now that I've removed original points and added new ones. I'm still open to better ways to do it and to check everything is done correctly.

UPDATE

I made a function that could be more useful that making things step-wise and let's you play with the threshold.

distance_fix = function(dataframe,threshold){


  mydist = as.matrix(dist(dataframe))

  # Which pairs in the upper triangle are below threshold 
  pairs <- which(mydist < threshold & upper.tri(mydist), arr.ind = T)

  # Get the row numbers for subsetting the original matrix 
  indx=unique(c(pairs))

  # Get result dataframe
  out = data.frame(rbind(dataframe[-indx,],t(apply(pairs,1,function(i) apply(dataframe[i,],2,mean)))),row.names=NULL)

 return(out) 
}
Matias Andina
  • 4,029
  • 4
  • 26
  • 58