I am building a stochastic model to predict the movement of objects floating in the ocean. I have thousands of data from drifter buoys all around the world. In the format as below:
index month year lat long
72615 10 2010 35,278 129,629
72615 11 2010 37,604 136,365
72615 12 2010 39,404 137,775
72615 1 2011 39,281 138,235
72620 1 2011 35,892 132,766
72620 2 2011 38,83 133,893
72620 3 2011 39,638 135,513
72620 4 2011 41,297 139,448
The general concept for the model is to divide whole world into 2592 cells of magnitude of 5x5 degrees. And then create the Markov's chain transition matrix using the formula that
the probability of going from cell i to cell j in 1 month equals to:
the number of times any buoy went from cell i to cell j in 1 month
divided by the
number of times any buoy exitted i (including going from i to i).
However I have two troubles related to managing the data.
1. Is there an easy solution (preferably in Excel or R) to add 6-th column to the data set, whose values would depend only on the value of latitude and longitude, such that it would equal to:
1 when both latitude and longitude are between 0 and 5
2 when latitude is between 0 and 5 and longitude between 5 and 10
3 when latitude is betwwen 0 and 5 and longitude between 10 and 15
and so on up to the number 2592
2. Is there an easy way to count the number of times any buoy went from cell i to cell j in 1 month?
I was trying to figure out the solution to the question 1 in Excel, but could not think of anything more efficient than just sorting by the latitude / longitude columns and then writing the values manually.
I've been also told that R is much better for managing such data sets, but I am not experienced with it and couldn't find the solution myself.
I would really appreciate any help.