1

I have a large number of huge data frames. Often in these data frames I have groups of columns with similar names that appear sequentially. The following is a simplified version of such data frame:

> tmp <- data.frame(ID = 1:25,
    Item1 = sample(x = 1:4, size = 25, replace = TRUE),
    Item2 = sample(x = 1:4, size = 25, replace = TRUE),
    Item3 = sample(x = 1:4, size = 25, replace = TRUE),
    Item4 = sample(x = 1:4, size = 25, replace = TRUE),
    Item5 = sample(x = 1:4, size = 25, replace = TRUE),
    Item6 = sample(x = 1:4, size = 25, replace = TRUE),
    Item7 = sample(x = 1:4, size = 25, replace = TRUE),
    Quest = rep(x = 20, times = 25))

I need to find a way to index these columns by ranges of their names, not by their positions. Say I need to index columns from Item4 to Item7. I could do the following:

> tmp[ , c("Item4", "Item5", "Item6", "Item7")]

Not quite nice when you have hundreds of columns with similar names. I would like to do something like:

> tmp[ , c("Item4":"Item7")]

But it throws an error:

Error in "Item1":"Item7" : NA/NaN argument
In addition: Warning messages:
1: In `[.data.frame`(tmp, , c("Item1":"Item7")) :
  NAs introduced by coercion
2: In `[.data.frame`(tmp, , c("Item1":"Item7")) :
  NAs introduced by coercion

Further, I'd like to use this kind of indexing to manipulate, say, columns' attributes in a way as (using the former approach listing all column names)

> labels.Item4to7 <- c("Disagree", "Somewhat disagree",
  "Somewhat agree", "Agree")
> tmp[ , c("Item4", "Item5", "Item6", "Item7")] <- lapply(tmp[ , c("Item4",
  "Item5", "Item6", "Item7")], factor, labels = labels.Item4to7)

But defining ranges of column names as Item4:Item7.

Thank you in advance.

aynber
  • 22,380
  • 8
  • 50
  • 63
panman
  • 1,179
  • 1
  • 13
  • 33
  • What does your 2nd question mean - do you want to rename those columns? You can store the column names that you wish to have in your subset in `cols <- paste0("Item", 4:7)` and use `tmp[, cols]` as a shortcut. – lukeA Feb 16 '15 at 22:22
  • @lukeA: No, not to rename them, but to alter their attributes. The type of indexing you suggest is also useful for other situations I was thinking about. Thanks! – panman Feb 17 '15 at 11:14

2 Answers2

3

Use function which

tmp[,which(names(tmp)=="Item4"):which(names(tmp)=="Item7")]

Changing the values of items 4 to 7 could be achieved with the following:

labels.Item4to7 <- c("Disagree", "Somewhat disagree",
  "Somewhat agree", "Agree")
tmp[,which(names(tmp)=="Item4"):which(names(tmp)=="Item7")]<-
   lapply(tmp[,which(names(tmp)=="Item4"):which(names(tmp)=="Item7")],
   factor,labels=labels.Item4to7)
2

You could use paste:

tmp[, paste0("Item", 4:7)]
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • 1
    For the second part of the question, something like `cols <- paste0("Item", 4:7); tmp[cols] <- lapply(tmp[cols], factor, labels=labels.Item4to7)` I think. – thelatemail Feb 16 '15 at 22:36
  • @lukeA & thelatemail: Yes, these work, although using numerical indexing which is something I wanted to avoid. It is still very useful for other situations I often come to, so thank you very much! – panman Feb 17 '15 at 11:19