0

I have a point data (Lon and Lat), and I would like to create presence and absence data from it. How can I do this in R? Example is below.

Load data:

df <- read.table(text=
"species        lon   lat
Oncorhynchus_kisutch    -130.25 55.75
Oncorhynchus_kisutch    -129.75 55.75
Oncorhynchus_kisutch    -130.25 55.25
Oncorhynchus_kisutch    -129.75 55.25
Oncorhynchus_kisutch    -129.25 55.25
Oncorhynchus_kisutch    -133.25 54.75
Oncorhynchus_kisutch    -131.75 54.75
Oncorhynchus_kisutch    -131.25 54.75
",
header=TRUE
)
head(df)

 # create grid
reso = 0.25
xs <- seq(-180, 180, by=reso)
ys <- seq(-90, 90, by=reso)
grd <- expand.grid(
  x=xs,
  y=ys,
  presence=0
)
head(grd)

# query
for(i in seq(nrow(df))){
  tmp <- which(df$lon[i] == grd$x & df$lat[i] == grd$y)
  if(length(tmp)>0){
    grd$presence[tmp] <- 1
  }
}

png("plot.png", width=5, height=5, units="in", res=600, type="cairo")
plot(grd$x, grd$y, pch=1, cex=1, col=c(NA, 1)[grd$presence+1], lwd=0.5)
dev.off()

However, this code doesn't work for me when I use irregular Lon and Lat data like this:

  Lat            Lon
24.08333    32.88333
30.00486    31.23571
32.5        27.5
33.333     -8.417
34.395      26.05383
34.5       -6.917
35          15
35.0738     32.3335
35.19509    25.71726
35.2047     25.7221
35.26665    26.22368
35.33198    25.39159
user3071284
  • 6,955
  • 6
  • 43
  • 57
  • can you give us the error you had ? R is case sensitive so if your colnames are Lon and Lat instead of lon and lat the code won't work. In addition in the example the lon and lat vary by 0.25 which is not the case for you so I doubt it can work – etienne Nov 10 '15 at 17:24
  • thanks Ronak, i have all has absence ( 0) , no presence data (1). yes the Lon and Lat vary by 0.25. But I want the same grid that was created . thanks – user5545418 Nov 10 '15 at 17:30
  • 1
    If you have irregular data, but want data for a regular grid, you'll need some kind of interpolation. – Roland Nov 10 '15 at 17:36
  • 1
    This is because the Lon/Lat of your points will not exactly match the grid x and y. You will have to use the ranges for each grid cell rather than the x and y (which define only the centers or corners of your grid). i.e. use x+/-0.25 and y+/-0.25. You will have to be careful with your use of < > vs <= and <=. – Cotton.Rockwood Nov 10 '15 at 17:38
  • Hi cotton, thanks for you input. am really new to R. Please where will I add this in the codes, is it while creating the grids or query for the presence? thanks – user5545418 Nov 10 '15 at 17:50
  • @user5545418, I posted some edited code in an answer below. Let me know if this is what you are looking for. – Cotton.Rockwood Nov 10 '15 at 18:02
  • 1
    [This Q&A](http://stackoverflow.com/questions/32889531/r-how-can-i-count-how-many-points-are-in-each-cell-of-my-grid?rq=1) may be relevant. See also `sp::over` (e.g. `x = "SpatialGrid", y = "SpatialPoints"`) – Henrik Nov 10 '15 at 18:44

1 Answers1

0

This is because the Lon/Lat of your points will not exactly match the grid x and y. You will have to use the ranges for each grid cell rather than the x and y (which define only the centers or corners of your grid). i.e. use x+/-0.125 and y+/-0.125. This assumes that the grid values are for the cell centers, not one of the corners. You will have to adjust depending on which grid specification you intend. I changed the values in the data frame to points not on the grid. Check to see if this works for you:

df <- read.table(text=
                      "species        lon   lat
                  Oncorhynchus_kisutch    -130.7 55.95
                  Oncorhynchus_kisutch    -129.6 55.235
                  Oncorhynchus_kisutch    -130.21 55.34
                  Oncorhynchus_kisutch    -129.751 55.901
                  Oncorhynchus_kisutch    -129.2509 55.5
                  Oncorhynchus_kisutch    -133.99 54.834
                  Oncorhynchus_kisutch    -131.9 54.75
                  Oncorhynchus_kisutch    -131.25 54.532
                  ",
                 header=TRUE
)
head(df)

 # create grid
reso = 0.25
xs <- seq(-180, 180, by=reso)
ys <- seq(-90, 90, by=reso)
grd <- expand.grid(
  x=xs,
  y=ys,
  presence=0
)
head(grd)

# query
for(i in seq(nrow(df))){
     tmp <- which(df$lon[i] >= grd$x-0.125 & df$lon[i] < grd$x+0.125 & df$lat[i] >= grd$y-0.125 & df$lat[i] < grd$y+0.125)
     if(length(tmp)>0){
         grd$presence[tmp] <- 1
     }
 }
table(grd$presence)
grd[grd$presence==1,]
Cotton.Rockwood
  • 1,601
  • 12
  • 29
  • Hey guys, I ran into another problem. If I use this data set like the one below, I will have all the 1036800 cells as 0( absence). Please help!!!! Lat Lon 1 -34.06660 23.05000 2 -34.05000 23.06670 3 -34.01550 25.20500 4 -33.78330 26.43330 6 -29.82440 31.05120 7 -29.23000 31.50000 9 -28.36670 32.40000 10 -26.00000 32.91670 11 -21.39500 149.95500 12 -20.00000 43.00000 13 -11.50000 141.03444 18 12.00000 101.00000 19 13.09250 80.29444 20 14.43300 120.48300 21 18.25000 109.25000 23 18.50000 110.50000 25 19.50000 111.00000 – user5545418 Nov 10 '15 at 19:18
  • I can't check why the approach is not working for the new data right now, but you may wish to use a more formal spatial approach using the raster package as shown in the question linked by @Henrik in the comment above. My edit was just for the most simple solution with slight modification of your code. Using the raster package is a better way to approach this problem. – Cotton.Rockwood Nov 10 '15 at 19:33
  • thanks cotton, I did figure where the problem is. many thanks – user5545418 Nov 10 '15 at 20:08