7

I imported data from a .csv file, and attached the dataset.
My problem: one variable is in integer form and has 295 levels. I need to use this variable to create others, but I don't know how to deal with the levels.

What are these, and how do I deal with them?

Linus Fernandes
  • 498
  • 5
  • 30
Thomas
  • 847
  • 4
  • 12
  • 21
  • This could mean a few things, depending on what you mean by 'levels'. It sounds a bit like your numbers have been converted to factor variables because somewhere is a badly-formed number, so R converts it all to categorical 'factor' variables. Could you cut and paste the exact code and error messages here please? – Spacedman Dec 01 '10 at 22:18
  • Have you read `?factor`? Or `?levels` – Marek Dec 01 '10 at 22:24
  • setwd("D:/users/me/Desktop") data <- read.csv("Rdata.csv") attach(data) ctr <- for (i in 1:4722) {as.integer(a[i]/b[i])} – Thomas Dec 01 '10 at 22:24
  • 1: In Ops.factor(a[i], b[i]) : / not meaningful for factors – Thomas Dec 01 '10 at 22:25

4 Answers4

8

When you read in the data with read.table (or read.csv? - you didn't specify), add the argument stringsAsFactors = FALSE. Then you will get character data instead.

If you are expecting integers for the column then you must have data that is not interpretable as integers, so convert to numeric after you've read it.

txt <- c("x,y,z", "1,2,3", "a,b,c")

d <- read.csv(textConnection(txt))
sapply(d, class)
       x        y        z 
##"factor" "factor" "factor" 

## we don't want factors, but characters
d <- read.csv(textConnection(txt), stringsAsFactors = FALSE)
sapply(d, class)

#          x           y           z 
#"character" "character" "character" 

## convert x to numeric, and wear NAs for non numeric data
as.numeric(d$x)

#[1]  1 NA
#Warning message:
#NAs introduced by coercion 

Finally, if you want to ignore these input details and extract the integer levels from the factor use e.g. as.numeric(levels(d$x))[d$x], as per "Warning" in ?factor.

mdsumner
  • 29,099
  • 6
  • 83
  • 91
5

or you can simply use

d$x2 = as.numeric(as.character(d$x)).

Tonio Liebrand
  • 17,189
  • 4
  • 39
  • 59
Arthur
  • 51
  • 1
4

Working from your clarification I suggest you redo your read statement with read.table and header=TRUE, stringAsFactors=FALSE and as.is = !stringsAsFactors and sep=",":

datinp <- read.table("Rdata.csv", header=TRUE, stringAsFactors=FALSE , 
                       as.is = !stringsAsFactors , sep=",") 
datinp$a <- as.numeric(datinp$a)
datinp$b <- as.numeric(datinp$b)
datinp$ctr <- with(datinp, as.integer(a/b) ) # no loop needed when using vector arithmetic
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

Do summary(data) to check things got read in properly. If columns aren't numeric that should be, look at the colClasses argument to read.csv to force it, which will probably also result in NA values for poorly-formed numbers.

help(read.csv) will help.

Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • Spacedman: can you use colClasses to force NAs? I thought so at first but txt <- c("x,y,z", "1,2,3", "a,b,c", "1,2,3"); d <- read.table(textConnection(txt), sep = ",", header = TRUE, colClasses = rep("numeric", 3)) errors on scan. Is there something I'm missing in read.table? – mdsumner Dec 01 '10 at 22:39
  • @mdsummer: take out those double-quotes. They are bundling your characters together in a way you aren't intending. – IRTFM Dec 01 '10 at 22:50
  • @mdsummer: But the problem persisted, anyway. as.is=TRUE is needed. – IRTFM Dec 01 '10 at 22:57