0

I am building a stochastic model to predict the movement of objects floating in the ocean. I have thousands of data from drifter buoys all around the world. In the format as below:

  index month year lat long

  72615 10 2010 35,278 129,629

  72615 11 2010 37,604 136,365

  72615 12 2010 39,404 137,775

  72615 1 2011 39,281 138,235

  72620 1 2011 35,892 132,766

  72620 2 2011 38,83 133,893

  72620 3 2011 39,638 135,513

  72620 4 2011 41,297 139,448

The general concept for the model is to divide whole world into 2592 cells of magnitude of 5x5 degrees. And then create the Markov's chain transition matrix using the formula that

the probability of going from cell i to cell j in 1 month equals to:

the number of times any buoy went from cell i to cell j in 1 month

divided by the

number of times any buoy exitted i (including going from i to i).


However I have two troubles related to managing the data.

1. Is there an easy solution (preferably in Excel or R) to add 6-th column to the data set, whose values would depend only on the value of latitude and longitude, such that it would equal to:

1 when both latitude and longitude are between 0 and 5

2 when latitude is between 0 and 5 and longitude between 5 and 10

3 when latitude is betwwen 0 and 5 and longitude between 10 and 15

and so on up to the number 2592

2. Is there an easy way to count the number of times any buoy went from cell i to cell j in 1 month?

I was trying to figure out the solution to the question 1 in Excel, but could not think of anything more efficient than just sorting by the latitude / longitude columns and then writing the values manually.

I've been also told that R is much better for managing such data sets, but I am not experienced with it and couldn't find the solution myself.

I would really appreciate any help.

  • [This question and answer](https://gis.stackexchange.com/questions/225157/generate-rectangular-fishnet-or-vector-grid-cells-shapefile-in-r?noredirect=1&lq=1) at GIS StackExchange uses the R `sf` package to generate a grid of `i x j` cells. But it would require becoming familiar with that package and its methods. – neilfws Nov 07 '17 at 00:44

1 Answers1

1

Someone can probably come up with something much more sophisticated/faster, but this is a crude approach that has the benefit of being relatively easy to understand.

Sample data:

dd <- read.table(header=TRUE,dec=",",text="
index month year lat long
72615 10 2010 35,278 129,629
72615 11 2010 37,604 136,365
72615 12 2010 39,404 137,775
72615 1 2011 39,281 138,235
72620 1 2011 35,892 132,766
72620 2 2011 38,83 133,893
72620 3 2011 39,638 135,513
72620 4 2011 41,297 139,448")

Generate indices that equal 1 for (0-5), 2 for (6-10), etc.

dd$x <- (dd$lat %/% 5) + 1
dd$y <- (dd$long %/% 5) + 1

Set up an empty matrix (not sure I have the rows/columns right)

mm <- matrix(0,nrow=36,ncol=72)

(you might want to use the dimnames argument here for clarity)

Fill it in:

for (i in 1:nrow(dd)) {
  mm[dd[i,"x"],dd[i,"y"]] <- mm[dd[i,"x"],dd[i,"y"]]+1
}

If you have only thousands of rows, this might be fast enough. I would try it and see if you need something fancier. (If you need to collapse the matrix back to a set of columns, you can use reshape2::melt or tidyr::gather ...)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453