The stu.csv contains 850,000 rows and 3 columns. The 2nd column is the longitude of ID, the 3rd column is the latitude of ID. The data in stu.csv file is like this:
ID longitude latitude
156 41.88367183 12.48777756
187 41.92854333 12.46903667
297 41.89106861 12.49270456
89 41.79317669 12.43212196
79 41.90027472 12.46274618
... ... ...
The pseudocode is as follows. it aims to compute the distance between two IDs on the surface of the earth with longitude and latitude, and outputs the cumulative sum from any two IDs:
dlon = lon2 - lon1
dlat = lat2 - lat1
a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
c = 2 * atan2( sqrt(a), sqrt(1-a) )
distance = 6371000 * c (where 6371000 is the radius of the Earth)
This code is as follows, but it runs too slow. how to speed and rewrite the code? Thank you.
stu<-read.table("stu.csv",header=T,sep=",");
## stu has 850,000 rows and 3 columns.
m<-nrow(stu);
distance<-0;
for (i in 1:(m-1))
{
for (j in (i+1))
{
dlon = stu[j,2] - stu[i,2];
dlat = stu[j,3] - stu[i,3];
a = (sin(dlat/2))^2 + cos(stu[i,3]) * cos(stu[j,3]) * (sin(dlon/2))^2;
c = 2 * atan2( sqrt(a), sqrt(1-a) );
distance <-distance+6371000 * c;
}
}
distance