doParallel() and mySQL in R: Database not receiving data

Question

I am using RMySQL() to send data from R to a MySQL database. The problem is that the database does not receive any data.... I am using doParallel() since i am running over 4500 iterations.... could it be because i try to send the data to the database in the pullSpread() function?

library(RMySQL)
library(doParallel)
library(stringr)
library(foreach)

makeCluster(detectCores()) # ANSWER = 4
cl <- makeCluster(4, type="SOCK") # also used PSOCK & FORK but receive the same problem
registerDoParrallel(cl)

# Now use foreach() and %dopar% to pull data...
# the apply(t(stock1), 2, pullSpread) works but not "parallelized"
# I have also used clusterApply() but is unsuccessful
system.time(
foreach(a=t(stock1)) %dopar% pullSpread(a)
)

When I look in my working directory, all the files are copied successfully onto a .csv file as it should but when I check MySQL workbench or even call the files from R they do not exist...

Here is the stock1() character vector and the pullSpread() function used...

# This list contains more than 4500 iterations.. so I am only posting a few
stock1<-c(
  "SGMS.O","SGNL.O","SGNT.O",
  "SGOC.O","SGRP.O", ...)

Important Dates needed for function:

Friday <- Sys.Date()-10

# Get Previous 5 days
Thursday <- Friday - 1
Wednesday <- Thursday -1
Tuesday <- Wednesday -1
Monday <- Tuesday -1

#Make Them readable for NetFonds 
Friday <- format(Friday, "%Y%m%d")
Thursday<- format(Thursday, "%Y%m%d")
Wednesday<- format(Wednesday, "%Y%m%d")
Tuesday<- format(Tuesday, "%Y%m%d")
Monday<-format(Monday, "%Y%m%d")

Here is the pullSpread() function:

pullSpread = function (stock1){
AAPL_FRI<- read.delim(header=TRUE, stringsAsFactor=FALSE,
                    paste(sep="",
                          "http://www.netfonds.no/quotes/posdump.php?date=",
                          Friday,"&paper=",stock1,"&csv_format=txt"))

tryit <- try(AAPL_FRI[,c(1:7)])

if(inherits(tryit, "try-error")){

rm(AAPL_FRI)

} else {



AAPL_THURS<- read.delim(header=TRUE, stringsAsFactor=FALSE,
                      paste(sep="",
                            "http://www.netfonds.no/quotes/posdump.php?date=",
                            Thursday,"&paper=",stock1,"&csv_format=txt"))

AAPL_WED<- read.delim(header=TRUE, stringsAsFactor=FALSE,
                    paste(sep="",
                          "http://www.netfonds.no/quotes/posdump.php?date=",
                          Wednesday,"&paper=",stock1,"&csv_format=txt"))

AAPL_TUES<- read.delim(header=TRUE, stringsAsFactor=FALSE,
                     paste(sep="",
                           "http://www.netfonds.no/quotes/posdump.php?date=",
                           Tuesday,"&paper=",stock1,"&csv_format=txt"))

AAPL_MON<- read.delim(header=TRUE, stringsAsFactor=FALSE,
                    paste(sep="",
                          "http://www.netfonds.no/quotes/posdump.php?date=",
                          Monday,"&paper=",stock1,"&csv_format=txt"))


SERIES <- rbind(AAPL_MON,AAPL_TUES,AAPL_WED,AAPL_THURS,AAPL_FRI)

#Write .CSV File
write.csv(SERIES,paste(sep="",stock1,"_",Friday,".csv"), row.names=FALSE) 
dbWriteTable(con2,paste0( "",str_sub(stock1, start = 1L, end = -3L),""),paste0(   
"~/Desktop/R/",stock1,"_",Friday,".csv"), append=T)
}
}

Where does a MySQL database come into this? You're reading and concatenating tab delimited files from the internet. — Richie Cotton, Jul 14 '14 at 12:21
@Richiecotton I basically send the information written on the `.csv` file to MySQL database towards then end of the `pullSpread` function. The dates are adjusted to capture Mon-Fri via adjusting manually... i have updated the code so it only pulls weekdays — Rime, Jul 14 '14 at 12:44
You need to do a small test of whether you can connect to the database. Forget the looping; keep it as simple as possible. — Richie Cotton, Jul 14 '14 at 12:58

Richie Cotton · Accepted Answer · 2014-07-15T08:37:26.520

Retrieve last Friday using something like this:

Friday <- Sys.Date()
while(weekdays(Friday) != "Friday") 
{
  Friday <- Friday - 1
}

As a matter of good practice, when retrieving data from the internet, separate the act of downloading it with processing it. That way, when the processing fails, you don't waste time and bandwidth redownloading things.

lastWeek <- format(Friday - 0:4, "%Y%m%d")
stockDatePairs <- expand.grid(Stock = stock1, Date = lastWeek)
urls <- with(
  stockDatePairs,
  paste0(
    "http://www.netfonds.no/quotes/posdump.php?date=",
    Date,
    "&paper=",
    Stock,
    "&csv_format=txt"
  )
)
for(url in urls)
{
  # or whatever file name you want
  download.file(url, paste0("data from ", make.names(url), ".txt"))
}

Make sure that you know which directory those files are being saved to. (Either provide an absolute path or set your working directory.)

Now try reading and rbinding those files.

If that works, then you can try doing things in parallel.

Also note that many online data services will limit the rate that you can download, unless you are paying for the service. So parallel downloads may just mean that you hit the limit quicker.

Wow thanks man, that cleaned up a lot of code. However, when I run this, it returns the first five iterations against the the five days of the week. in other words,instead of having 5 days per iteration, i have only one day per iteration. — Rime, Jul 14 '14 at 14:33
@Rime I hadn't spotted that `stock1` was supposed to be a character vector rather than a single string. See update. — Richie Cotton, Jul 15 '14 at 08:38
That note about separating the process of downloading and processing data should be in bold, blinking, shining letters, that sprout unicorns and part the clouds until smurfs randomly erupt in song and dance. Yes, it's that much of a time saver. — Brandon Bertelsen, Oct 14 '16 at 07:45

doParallel() and mySQL in R: Database not receiving data

1 Answers1