Error using "in", "if", as a column name in R

Question

Just run into this problem. I was using a data frame with several thousands of columns created out of words and word splits. One of my columns resulted with the name "in" another in "if". When one tries to do something like data$in, there is an error message complaining about that. See example:

require(tm)
text<-data.frame(colText<- c("namein", "Inmortal"))
corpus <- Corpus(DataframeSource(text))
corpus[[1]]
<<PlainTextDocument (metadata: 7)>>
  namein
ctrl <- list(tokenize = strsplit_character_tokenizer,wordLengths=c(1, Inf))

dtm <- DocumentTermMatrix(corpus, control = ctrl)
str(dtm)
dtm$dimnames$Terms
[1] "a"    "al"   "e"    "ein"  "i"    "in"   "inm"  "inmo" "l"    "m"    "me"   "mo"   "n"    "na"   "nam"  "name" "o"    "ort" 
[19] "r"    "rt"   "rtal" "t"   

dtmF <- as.data.frame(inspect(dtm))

dtm$inm
[1] 0 1
dtmF$in
Error: unexpected 'in' in "dtmF$in"

strsplit_character_tokenizer <- function(x){
  r<-list()
  max=4
  for (i in 1:max) {
    reg<-paste("([[:alnum:]]{",i,"})", sep="")
    tmp=unlist(strsplit(gsub(reg, "\\1 ", x), " "))
    r<-c(r,tmp)
  }
  return (unlist(r))
}

As a result when I train a svm for classification it crashes, How can one overcome this issue? i could rename some of those column names, but I would like a more generic solution Thanks

I do not yet see in either of the offered answers the reason for the error. It is because `in` is a reserved R word. The parser sees it as a partial `for`-loop call. See `?Reserved` — IRTFM, Jan 15 '15 at 22:08

score 2 · Answer 1 · answered Jan 15 '15 at 17:49

2

You need to us the ` mark.

> my$in
Error: unexpected 'in' in "my$in"
> my$`in`
[1] 1 2 3 4 5

answered Jan 15 '15 at 17:49

Andrew Taylor

3,438
1
26
47

score 1 · Answer 2 · answered Jan 15 '15 at 17:47

1

Rather than using $, you could access the columns as

dtmF[["in"]] and dtmF[["if"]]

or

dtmF[, "in"] and dtmF[, "if"]

answered Jan 15 '15 at 17:47

David Robinson

77,383
16
167
187

Thanks David. Sorry maybe for not being precise enough, even dtmF$"in" works, I know how to access the column values, but what I need is a generic solution, if I want to use it for classification for example, svm, etc – Dr VComas Jan 15 '15 at 17:54
@DrVComas Perhaps I don't understand the full problem. Rereading, I see you note `when I train a svm for classification it crashes`. That's the actual problem that needs a general solution: you need to show the code that does that training and show how it crashes. As it is, the problem you show is just accessing it with `$`. – David Robinson Jan 15 '15 at 17:56
Lets say after I have the dataframe, I will run several classification algorithms, inside they access the data this way, and return errors, – Dr VComas Jan 15 '15 at 17:57
Make a new question where you show us the code for the classification algorithms and where you are trying to access the data. The comments are not an appropriate place to pose new questions. – Andrew Taylor Jan 15 '15 at 18:01
Ok, I ll post another question. – Dr VComas Jan 15 '15 at 18:03
@DrVComas I'd suggest editing this question to include the details: I don't think it's an entirely new question, it's just that this question was unclear about what you wanted (because it didn't contain the code for the SVN or show how it crashed) – David Robinson Jan 15 '15 at 18:04

Error using "in", "if", as a column name in R

2 Answers2