17

I'm trying to read a .csv file into R where all the column are numeric. However, they get converted to factor everytime I import them.

Here's a sample of how my CSV looks like:

enter image description here

This is my code:

options(StringsAsFactors=F)
data<-read.csv("in.csv", dec = ",", sep = ";")

As you can see, I set dec to , and sep to ;. Still, all the vectors that should be numerics are factors!

Can someone give me some advice? Thanks!

Henrik
  • 65,555
  • 14
  • 143
  • 159
intael
  • 508
  • 2
  • 7
  • 21
  • 3
    Have you tried `data <- read.csv("in.csv",dec=",",sep=";", stringsAsFactors=FALSE)`? – ialm Nov 19 '13 at 00:20
  • What ialm suggested should work. Alternatively, you could try `read.csv("in.csv",dec=",",sep=";",colClasses=rep("numeric", numberofcolumns))` where you need to supply the number of columns. – Jota Nov 19 '13 at 00:22
  • 6
    I suspect the problem is caused by those `N/A` cells, so `data<-read.csv("in.csv",dec=",",sep=";", na.strings="N/A")` might fix it. – Marius Nov 19 '13 at 00:36

3 Answers3

12

Your NA strings in the csv file, N/A, are interpreted as character and then the whole column is converted to character. If you have stringsAsFactors = TRUE in options or in read.csv (default), the column is further converted to factor. You can use the argument na.strings to tell read.csv which strings should be interpreted as NA.

A small example:

df <- read.csv(text = "x;y
                 N/A;2,2
                 3,3;4,4", dec = ",", sep = ";")
str(df)

df <- read.csv(text = "x;y
                 N/A;2,2
                 3,3;4,4", dec = ",", sep = ";", na.strings = "N/A")
str(df)

Update following comment

Although not apparent from the sample data provided, there is also a problem with instances of '$' concatenated to the numbers, e.g. '$3,3'. Such values will be interpreted as character, and then the dec = "," doesn't help us. We need to replace both the '$' and the ',' before the variable is converted to numeric.

df <- read.csv(text = "x;y;z
               N/A;1,1;2,2$
               $3,3;5,5;4,4", dec = ",", sep = ";", na.strings = "N/A")
df
str(df)

df[] <- lapply(df, function(x){
  x2 <- gsub(pattern = "$", replacement = "", x = x, fixed = TRUE)
  x3 <- gsub(pattern = ",", replacement = ".", x = x2, fixed = TRUE)
  as.numeric(x3)
  }
                         )
df
str(df)
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • Okay, now the problem has been solved for most vectors. However there are still some vectors that get converted to factors due to the presence of the character "$" attached to some numbers. Any idea? – intael Nov 19 '13 at 08:07
  • This is... interesting. Both calls in the "small example" lead to the same result for me. In both cases, the N/A value – user3283722 Feb 15 '19 at 16:44
5

You could have gotten your original code to work actually - there's a tiny typo ('stringsAsFactors', not 'StringsAsFactors'). The options command wont complain with the wrong text, but it just wont work. When done correctly, it'll read it as char, instead of factors. You can then convert columns to whatever format you want.

aifille
  • 51
  • 1
  • 2
0

I just had this same issue, and tried all the fixes on this and other duplicate posts. None really worked all that well. The way I went about fixing it was actually on the excel side. If you highlight all the columns in your source file (in excel), right click==> format cells then select 'number' it'll import perfectly fine (so long as you have no non-numeric characters below the header)

Jesse001
  • 924
  • 1
  • 13
  • 37