I'm writing a script to get info from Baseball Reference web pages. The first time i wrote the code it worked fine and all the dates stored as factors were correctly parse to dates with the as.Date() function. Nevertheless, the a day later I ran the same script and I'm getting "NA"s in some of the dates of a variable and others are been converted well. There is another factor variable where all of them are returned as "NA"s.
I've goggled about it but I could only found issues about "NA"s because of missing days on the value (only month and year).
I've tried also to change the sys.setlocale from Portugal to US (LC_ALL","English") but I get the same result.
Th script I used is. Do you have any hint of what's missing?
Thanks.
library(XML)
Sys.setlocale("LC_ALL","English") # Used after first attempt
# Web page with players
url = "http://www.baseball-reference.com/bio/Venezuela_born.shtml"
# Create a List of the data-frames found in the Web Page, and define the type of colum data
url_Tables = readHTMLTable(url
,stringAsFactors = FALSE
,colClasses=c("integer","character",rep("integer",17)
,rep("numeric", 4),"factor","factor"
, "character", "character")
)
# Assign First table of the Web Page to a Data.Frame
batting = url_Tables[[1]]
summary(batting)
# Change the type of some colunms
batting$Birthdate = as.Date(batting$Birthdate, "%b %d, %Y") # For this column some of the values are parsed OK and others not (NAs).
batting$Debut = as.Date(batting$Debut, "%b %d, %Y") # For this column all the values are converted as "NA"s