I am well practiced at simple implementations in R, however I am new to communicating to SQL through R and also parallel programming (no prior experience of these two things before today). I have written the below code with prompts from blogs, forums etc.
library(doParallel)
library(RMySQL)
library(DBI)
library(foreach)
cl <- makeCluster(12)
registerDoParallel(cl)
Postcodecsv <- read.csv("C:/Users/Henry Crosby/Desktop/PostcodeLatLong.csv")
mydb = dbConnect(MySQL(), user='****', password="******* ****",
dbname='population_distance', host='****.**.*.*')
dbListFields(mydb,'Postcodes')
foreach (a = 1:120000, .combine="rbind") %dopar% {
Done <- dbGetQuery(mydb, paste("select FID, Postcode2, (6371 * acos( cos(
radians( ",Postcodecsv[a,6],"))*cos(radians(latitude))*cos(radians(Longitude)-radians(",Postcodecsv[a,5],"))+sin(radians(",Postcodecsv[a,6],") )* sin( radians( latitude ) ) ) ) AS distance from Postcodes having distance < 2 ORDER BY distance",sep=" "))
write.table(Done,file="C:/Users/Henry Crosby/Desktop/2km.csv",append=TRUE, col.names=FALSE, sep=",")
}
This calculation works in a for loop but takes forever (I have to apply this to a LARGE dataset!). When I run the above code I get the error below! Can someone tell me why the error is coming and how I can work around it!
Error in { : task 1 failed - "could not find function "dbGetQuery""