1

Use this example data to see what I mean

tag <- as.character(c(1,2,3,4,5,6,7,8,9,10))

species <- c("A","A","A","A","B","B","B","C","C","D")

size <- c(0.10,0.20,0.25,0.30,0.30,0.15,0.15,0.20,0.15,0.15)

radius <- (size*40)

x <- c(9,4,25,14,28,19,9,22,10,2)

y <- c(36,7,15,16,22,24,39,20,34,9)

data <- data.frame(tag, species, size, radius, x, y)


# Plot the points using qplot (from package tidyverse)
qplot(x, y, data = data) +
  geom_point(aes(colour = species, size = size))

Now that you can see the plot, what I want to do is for each individual “species A” point, I’d like to identify the largest point within a radius of size*40.

For example, in the bottom left of the plot you can see that species A (tag 2) would produce a radius large enough to contain the close species D point.

However, the species A point on the far right-hand-side of the plot (tag 3) would produce a radius large enough to contain both of the close species B and species C points, in which case I’d want some sort of output that identifies the largest individual within the species A radius.

I’d like to know what I can run (if anything) on this data set to get find the largest “within radius” point for each species A point and get an output like this:

Species A point ---- Largest point within radius

Species A tag 1 ----- Species C tag 9

Species A tag 2 ----- Species D tag 10

Species A tag 3 ----- Species B tag 5

Species A tag 4 ----- Species C tag 8

I've used spatstat and CTFSpackage to make some plots in the past but I can't figure out how to "find largest neighbor within radius". Perhaps I can tackle this in ArcMAP? Also, this is just a small example dataset. Realistically I will be wanting to find the "largest neighbor within radius" for thousands of points.

Any help or feedback would be greatly appreciated.

Jay
  • 157
  • 1
  • 2
  • 9
  • In `base` you can just calculate the Euclidean distance of each point to each Species A tag as `sqrt((x1-x2)^2 + (y1-y2)^2)`. Once you have a vector of distances to a Species A tag, you can use something like `max(distances_vector[distances_vector < 40])`. See if you can set it up for a single case and then work on iteration for each Species A tag. – Djork Oct 18 '17 at 23:40
  • 1
    Does the entire individual (or just the point at its center) need to fall within that radius? – Josh O'Brien Oct 19 '17 at 00:10
  • With Species A tag 1, you have B 7 also with the same size... Or I am not understanding correctly? – kangaroo_cliff Oct 19 '17 at 00:28
  • @JoshO'Brien Just the point at it's center. – Jay Oct 19 '17 at 02:29
  • @Headpoint Species A tag 1 is size 0.10 and Species B tag 7 is size 0.15 – Jay Oct 19 '17 at 02:34
  • @Djork Thank you I will try it! – Jay Oct 19 '17 at 02:35
  • Can a Species A of a different tag be the max point within a certain radius or are you only considering other species. Not clear from example output. – Djork Oct 19 '17 at 02:43
  • @Jay So, how did it go? – kangaroo_cliff Oct 20 '17 at 00:18
  • @Djork I'm only considering other species. If a Species A point is found to be the largest point within a radius, then it will be ignored and the largest "non-Species A" point will be selected. – Jay Oct 24 '17 at 17:12
  • @Headpoint IT WORKS! Thank you very much, you've helped me out a lot. I've been using R for a while now and I can do statistical tests, make nice graphics, and manage my data fine, but figuring out how to solve this problem was beyond my knowledge. I always try to figure out these problems myself as I think that's the best way for me to learn R but this problem stumped me and I couldn't find how to do it on stackoverflow. I'd like to learn how to solve problems like these myself, so it looks like I need to figure out how to use "for" in R and "iterations". Any recommendations where to start? – Jay Oct 24 '17 at 17:23
  • @Headpoint Also sorry for the late reply, I took a few days off. – Jay Oct 24 '17 at 17:24
  • @Jay no worries. May be Hadley's Advanced R is the book for you http://adv-r.had.co.nz/. I guess there are plenty of other sources available. Better to go through all major topics unless you have done it before in a methodical way. – kangaroo_cliff Oct 24 '17 at 23:28

1 Answers1

0

Following finds the largest species and tag pair that is within given radius for each of the species.

all_df <- data # don't wanna have a variable called data
res_df <- data.frame()
for (j in 1 : nrow(all_df)) {

  # subset the data
  df <- subset(all_df, species != species[j])
  # index of animals within radius
  ind <- which ((df$x - x[j])^2 +  (df$y - y[j])^2 < radius[j]^2 )

  # find the max `size` in the subset df
  max_size <- max(df$size[ind])
  # all indices with max_size in df
  max_inds <- which(df$size[ind] == max_size)
  # pick the last one is there is more than on max_size  
  new_ind <- ind[max_inds[length(max_inds)]]

  # results in data.frame
  res_df <- rbind(res_df, data.frame(org_sp = all_df$species[j], 
                                     org_tag = all_df$tag[j], 
                                     res_sp = df$species[new_ind], 
                                     res_tag = df$tag[new_ind]))
}

res_df
#      org_sp org_tag res_sp res_tag
# 1       A       1      C       9
# 2       A       2      D      10
# 3       A       3      B       5
# 4       A       4      C       8
# 5       B       5      A       3
# 6       B       6      C       8
# 7       B       7      C       9
# 8       C       8      B       5
# 9       C       9      B       7
# 10      D      10      A       2
kangaroo_cliff
  • 6,067
  • 3
  • 29
  • 42
  • Thank you very much. I forgot to add that if 2 or more different species were the same size and within the radius (like for Species A tag 1), then only the highest ranking species would be selected. So A would be selected over B, C and D. And species B would only be selected over C and D, and so on. Basically I want to include a species hierarchy that will narrow down the selection to one individual point. – Jay Oct 19 '17 at 02:40
  • What does "rank" means? Can't be difficult to change this to get what you are after... – kangaroo_cliff Oct 19 '17 at 02:43
  • I'm getting an error with your script Error in which(df$size[ind] == maxs) : object 'maxs' not found – Jay Oct 19 '17 at 02:49
  • 1
    These results are exactly what I am looking for though, thank you. I just get an error when i run it, any ideas? – Jay Oct 19 '17 at 02:52
  • Try again. I have forgotten to paste a line before. – kangaroo_cliff Oct 19 '17 at 02:53