2

When I try to read a comma separated values-file containing scandinavian letters into a data frame in r, with the read.table() command, it doesn't come out right. That is, I want letters such as "å", "æ", "ø", "ä" and "ö" to be included correctly. Right now, they are represented by non-alphabetic signs and they often cause other opperations, such as plotting, to complain.

I'm saving my csv-files in the ordinary text editor in OS X, but I've also tried using TextWrangler, saving my file in a specific format, such as UTF-8 and UTF-16 and then specify my encoding within the read.table() command with the "encoding=" option.

What does a minimal example, where scandinavian letters are imported from a csv-file into a data frame, look like?

smci
  • 32,567
  • 20
  • 113
  • 146
Speldosa
  • 1,900
  • 5
  • 21
  • 36

2 Answers2

4

You need to include more detail regarding your locale and you need to put a sample in a location where people can get it. At the moment my Mac seems to be reading the characters correctly (and I'm not in a locale where it's even needed):

> read.table(text='"å", "æ", "ø", "ä"', sep=",")
  V1 V2 V3 V4
1  å  æ  ø  ä
> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

(I also made a file with TextEdit.app and it also reads in properly. And they show up correctly in plotting.) You could try to specify an input encoding with the fileEncoding parameter:

> read.table(text='"å", "æ", "ø", "ä"', sep=",", fileEncoding="UTF-8")
  V1 V2 V3 V4
1  å  æ  ø  ä

... which does nothing for me but which might if your locale were set up as "C" which seems to happen for no good reason to some people with Macs. If you only use the 'encoding' parameter to read.table, it does nothing at the input stage but only assigns an attribute to the result of the read operation.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
0

I had this problem too, and a good person showed me what to do:

Using read_delim from 'readr' this worked:

metadata2 <- read_delim(filename,locale=locale(encoding="latin))

where file with name 'filename' has Scandinavian characters. The characters seen in metadata2 no longer had and blinky question marks!

Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32