15

I am doing some web scraping of names into a dataframe

For a name such as "Tomáš Rosický, I get a result "Tomáš Rosický"

I tried

Encoding("Tomáš Rosický") #  with latin1 response

but was not sure where to go from there to get the original name with accents back. Played around with iconv without success

I would be satisfied (and might even prefer) an output of "Tomas Rosicky"

mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194
pssguy
  • 3,455
  • 7
  • 38
  • 68
  • 2
    How did you read the data.frame? Usually you can supply an encoding parameter such as `fileEncoding` to `read.table`. And as @Hong Ooi answered, UTF-8 seems to be the encoding you need. – Tommy Mar 01 '12 at 06:48

4 Answers4

13

You've read in a page encoded in UTF-8. if x is your column of names, use Encoding(x) <- "UTF-8".

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
7

You should use this:

df$colname <- iconv(df$colname, from="UTF-8", to="LATIN1")
rink.attendant.6
  • 44,500
  • 61
  • 101
  • 156
Roadkill
  • 71
  • 1
  • 1
5

To do a correct read of the file use the scan function:

namb <- scan(file='g:/testcodering.txt', fileEncoding='UTF-8',
what=character(), sep='\n', allowEscapes=T)
cat(namb)

This also works:

namc <- readLines(con <- file('g:/testcodering.txt', "r",
encoding='UTF-8')); close(con)
cat(namc)

This will read the file with the correct accents

dpel
  • 1,954
  • 1
  • 21
  • 31
Mischa Vreeburg
  • 1,576
  • 1
  • 13
  • 18
3

A way to export accents correctly:

enc2utf8(as(dataframe$columnname, "character"))
iulilia
  • 31
  • 1