I have a database containing the names of Premiership footballers which I am reading into R (3.02), but am encountering difficulties when it comes to players with foreign characters in their names (umlauts, accents etc.). The code below illustrates this:
PlayerData<-read.table("C:\\Users\\Documents\\Players.csv",quote=NULL, dec = ".",,sep=",", stringsAsFactors=F,header=T,fill=T,blank.lines.skip = TRUE)
Test<-PlayerData[c(33655:33656),] #names of the players here are "Cazorla" "Özil"
Test[Test$Player=="Cazorla",] #Outputs correct details
Test[Test$Player=="Ozil",] # Can not find data '0 rows> (or 0-length row.names)'
<
#Example of how the foreign character is treated:
substr("Özil",1,1)
[1] "Ã"
substr("Özil",1,2)
[1] "Ö"
substr("Özil",2,2)
[1] "
substr("Özil",2,3)
[1] "z
I have tried replacing the characters, as described here: R: Replacing foreign characters in a string, but as the accented characters in my example appear to be read as two seperate characters I do not think it works.
I would be grateful for any suggestions or workarounds.
The file is available for download here.