3

First time caller.
I just want to change string encoding from UTF-8 to LATIN1. I use Xpath to retrieve the data from the web:

>library(RCurl)  
>library(rvest)
>library(XML)
>library(httr)
>library(reshape2)
>library(reshape)

>response <- GET(paste0("http://www.visalietuva.lt/imone/jogminda-uab-telsiai-muziejaus-g-35"))
>doc <- content(response,type="text/html")
>base <- xpathSApply(doc, "//ul//li//span",xmlValue)[5]

As as result I get the following:

>base
[1] "El. paštas"

When I check the encoding I have UTF-8:

>Encoding(base)
[1] "UTF-8"

I suspect I need LATIN1 encoding. So that the result would be "El. paštas", instead of "El. paÅ¡tas".

Although when I specifie the LATIN1 encoding I get the following:

>latin <- iconv(base, from = "UTF-8", to = "LATIN1")
[1] "El. paštas"

i.e. the same result as with UTF-8. Changing the encoding does not help to get "El. paštas".

Moreover I need the correct LATIN1 encoding of the string while saving data to .csv file. I tried to save the data to .csv:

write.table(latin,file = "test.csv")

and get the same strange characters as mentioned above: "El. paštas".

Any advice on how to change the encoding would be more than welcome. Thank you.

Aleksandr
  • 1,814
  • 11
  • 19

1 Answers1

0

Try

doc <- content(response,type="text/html", encoding = "UTF-8")
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • Thank you. That answers first part of the question. Now I can read data from url using correct encoding. Although are there any solutions to read the data from csv, when it contains such characters? Let assume that I have file with these strange symbols and I would like to read it into R with correct encoding as mentioned before (use UTF-8). I assume it won't work: file = read.csv("strangedata.csv", header=F, stringsAsFactors = F,encoding = "UTF-8") – Aleksandr Feb 13 '15 at 14:03
  • Well, why don't you just try `write.table` and `read.table` before assuming something? ;-) The help file `?read.csv` has a section on encoding. – lukeA Feb 13 '15 at 14:12
  • 1
    I do tried these recommendations but fail to get the result. The code is above: # utf.csv sample data link: https://www.dropbox.com/s/l77javsoy1272v8/utf.csv?dl=1 # reading strange characters from .csv read <- read.csv("utf.csv", encoding = "UTF-8", header = TRUE, stringsAsFactors = FALSE) # writing to .csv with UTF-8 con <- file("write.csv", open="w", encoding="UTF-8") write.table(read, con, sep=",", row.names=FALSE) close(con) Still, some characters in write.csv are strange. – Aleksandr Feb 13 '15 at 15:11
  • You should post a new question using refering to your sample data. Encoding errors is like opening Pandora's box... – lukeA Feb 13 '15 at 16:03