I have a matrix object, named location
, with three columns(ID
, latitude
, logitude
) and 18,289 rows:
# ID latitude longitude
# 320503 31.29530 120.5735
# 310104 31.18852 121.4365
# 310115 31.22152 121.5444
# 110105 39.92147 116.4431
I want to calculate the Haversine distance between each ID. In the package geosphere
the function distm()
can create the symmetric distance matrix, but its dimension is 18289 by 18289. However, R
reported the error that it cannot allocate vector of size 2.5 Gb
.
Similarly, with the ff
package,
data.distance.ff <- ff(0, dim = c(18289, 18289))
produces no errors, but when I assign values to the ff
matrix, R
produces errors come again.
data.distance.ff[1:18289, 1:18289] <- distm(location[, 2:3]
Error: cannot allocate vector of size 2.5 Gb
In addition: Warning messages:
1: In matrix(0, ncol = n, nrow = n) :
Reached total allocation of 2047Mb: see help(memory.size)
2: In matrix(0, ncol = n, nrow = n) :
Reached total allocation of 2047Mb: see help(memory.size)
3: In matrix(0, ncol = n, nrow = n) :
Reached total allocation of 2047Mb: see help(memory.size)
4: In matrix(0, ncol = n, nrow = n) :
Reached total allocation of 2047Mb: see help(memory.size)
I can verify this error with:
data.distance.ff[1:10000, 1:10000] <- distm(location[1:10000, 2:3]
And then get this error:
Error: cannot allocate vector of size 772.1 Mb.
My questions are:
- Is my code for assigning values to the
ff
matrix object wrong? Should I be using something special to assign values to anff
object instead? - Can
ff
objects handle the storage requirement? - Can I use another method to calculate the distance using an apply function that does not involve loops? I know the function
distm()
produces a matrix twice as large as needed because it is symmetric. - Are there any other methods for handling with the big data? The
bigmemory
package does not appear to work on my Windows computer.