1

I have 2 data frames (df1 and df2) that consist of three columns; x co-ordinate, y co-ordinate, category (with 5 levels A-E). So I essentially have 2 sets of points data with each point being assigned to a category

e.g.

X    Y    Cat
1    1.5  A
2    1.5  B
3.3  1.9  C

etc... (although both of my data frames have 100s of points in them)

I would like to find the nearest neighbour of the same category for each point in my first data frame (df1) from the second data frame (df2).

I've used nncross in the package spatstat to find the nearest neighbour for each point in df1 with df2, and then to list out each of these distances, as follows;

# Convert the dataframes to ppp objects

df1.ppp <- ppp(df1$X,df1$Y,c(0,10),c(0,10),marks=df1$Cat)
df2.ppp <- ppp(df2$X,df2$Y,c(0,10),c(0,10),marks=df2$Cat)

# Produce anfrom output that lists the distance from each point in df1 to its nearest neighbour in df2

out<-nncross(X=df1.ppp,Y=df2.ppp,what=c("dist","which"))

But I am struggling to work out how I use the category labels stored in the ppp objects (as defined by marks) to find the nearest neighbour from the same category. I am sure it should be fairly straight forward but if anyone has any suggestions or any alternative methods to achieve the same result I would be really grateful.

J. Cee
  • 49
  • 5

2 Answers2

0

First some artificial data to work with:

library(spatstat)

# Artificial data similar to the question
set.seed(42)
X1 <- rmpoint(100, win = square(10), types = factor(LETTERS[1:5]))
X2 <- rmpoint(100, win = square(10), types = factor(LETTERS[1:5]))

Then a simple solution (but it loses id info):

# Separate patterns for each type:
X1list <- split(X1)
X2list <- split(X2)

# For each point in X1 find nearest neighbour of same type in X2:
out <- list()
for(i in 1:5){
  out[[i]] <- nncross(X1list[[i]], X2list[[i]], what=c("dist","which"))
}

Finally, an ugly solution which recovers the id of the neighbour:

# Make separate marks for pattern 1 and 2 and collect into one pattern
marks(X1) <- factor(paste0(marks(X1), "1"))
marks(X2) <- factor(paste0(marks(X2), "2"))
X <- superimpose(X1, X2)

# For each point get the nearest neighbour of each type from both X1 and X2
# (both dist and index)
nnd <- nndist(X, by = marks(X))
nnw <- nnwhich(X, by = marks(X))

# Type to look for. I.e. the mark with 1 and 2 swapped
# (with 0 as intermediate step)
type <- marks(X)
type <- gsub("1", "0", type)
type <- gsub("2", "1", type)
type <- gsub("0", "2", type)

# Result
rslt <- cbind(as.data.frame(X), dist = 0, which = 0)
for(i in 1:nrow(rslt)){
  rslt$dist[i] <- nnd[i, type[i]]
  rslt$which[i] <- nnw[i, type[i]]
}

# Separate results
rslt1 <- rslt[1:npoints(X1),]
rslt2 <- rslt[npoints(X1) + 1:npoints(X2),]
rslt1$which <- rslt1$which - npoints(X1)
Ege Rubak
  • 4,347
  • 1
  • 10
  • 18
  • Thanks for that, that works great with the point files, although I also managed to solve this in he end by creating a distance matrix from the original dataframes! – J. Cee Feb 05 '16 at 10:09
0

I also had another go at tacking this but by using the package geosphere to create a distance matrix from my original data frames and found quite a simple way to solve this.

# load geosphere library 
library("geosphere")

#create a distance matrix between all points in the 2 dataframes
dist<-distm(df1[,c('X','Y')],df2[,c('X','Y')])

# find the nearest neighbour to each point
df1$nearestneighbor <- apply(dist,1,min)

# create a distance matrix where only the distances between points of the same category are recorded
sameCat <- outer(df1$Cat, df2$Cat, "!=")
dist2 <- dist + ifelse(sameCat, Inf, 0)

# find the nearest neighbour of the same category
df1$closestmatch <- apply(dist2,1,min)
J. Cee
  • 49
  • 5
  • Very nice solution. However, I guess you loose the information about which point in `df2` is the nearest neighbour to the point in `df1`. Do you not need that? To do this with spatstat you basically just have to replace `distm` by `crossdist`. – Ege Rubak Feb 05 '16 at 13:01
  • Yes (although it wasn't essential) it is certainly very useful to have the information about which point neighbours which and so your solution is very handy for that. – J. Cee Feb 08 '16 at 11:49