0

I have a CSV with a column format similar to this:

Section | ID | Totaltime | Item1/Word | Item1/Cat | Item1/Time...Item235/Time  

I would like to reshape this so that instead of all 235 entries per ID on a single row, there is a row per item, sorted/chunked by ID, so it looks similar to this-

Section | ID0 | Totaltime | Item1/Word | Item1/Cat | Item1/Time 
                            Item2/Word | Item2/Cat | Item2/Time
                            Item3/Word | Item3/Cat | Item3/Time
                           ...Item235/Word | Item235/Cat | Item235/Time
Section | ID1 | Totaltime | Item1/Word | Item1/Cat | Item1/Time...

I've tried to melt it using the ID as the vars.id argument, and the various Items pulled together with a grepl into the measures.vars argument, but this results in something like this-

Section | ID0 | Totaltime
Section | ID0 | Item1/Word 
Section | ID0 | Item1/Cat 
Section | ID0 | Item1/Time 
             ...
Section | ID0 | Item235/Word 
Section | ID0 | Item235/Cat 
Section | ID0 | Item235/Time

I've also tried recasting this, but without much luck.

I'm new to R as of this week, so I'm sure there is probably something super obvious I'm missing, but I've hit a wall on this.

  • 1
    Are you sure you want to do that? It seems like it might be harder to work with in the format you desire. – Rich Scriven Apr 07 '15 at 17:59
  • It's not my data. This is just how they requested it. Long instead of wide, in this particular format. – user4759802 Apr 07 '15 at 18:02
  • I would go back to the client and ask why they wish to have it in this specified format and enumerate the problems with doing so. As @RichardScriven noted, it seems unwieldy. – alexwhitworth Apr 07 '15 at 18:12

3 Answers3

1

melt from data.table v1.9.5+ can operate on multiple columns. (using @rawr's data)

require(data.table) # v1.9.5+
vals = unique(gsub("Item[0-9]+/", "", tail(names(dd), -3L)))
melt(setDT(dd), id=1:3, measure=lapply(vals, grep, names(dd)), value.name=vals)
#     Section   ID0 Totaltime variable   Word   Cat   Time
#  1:       1 10001       100        1 1/word 1/cat 1/time
#  2:       2 10002       200        1 1/word 1/cat 1/time
#  3:       3 10003       300        1 1/word 1/cat 1/time
#  4:       4 10004       400        1 1/word 1/cat 1/time
#  5:       5 10005       500        1 1/word 1/cat 1/time
#  6:       1 10001       100        2 2/word 2/cat 2/time
#  7:       2 10002       200        2 2/word 2/cat 2/time
#  8:       3 10003       300        2 2/word 2/cat 2/time
#  9:       4 10004       400        2 2/word 2/cat 2/time
# 10:       5 10005       500        2 2/word 2/cat 2/time
# 11:       1 10001       100        3 3/word 3/cat 3/time
# 12:       2 10002       200        3 3/word 3/cat 3/time
# 13:       3 10003       300        3 3/word 3/cat 3/time
# 14:       4 10004       400        3 3/word 3/cat 3/time
# 15:       5 10005       500        3 3/word 3/cat 3/time
Arun
  • 116,683
  • 26
  • 284
  • 387
  • I'm getting this error: `Argument 'value.name' must be a character vector of length 1` – user4759802 Apr 09 '15 at 18:43
  • You're using an older version of `data.table`. Please follow the link in the post to upgrade to 1.9.5. – Arun Apr 09 '15 at 19:05
  • Got v.1.9.5 now, but the melt is giving me a table exactly like the first one in wide format. The vals variable seems to be selecting the correct values though when I view it. – user4759802 Apr 09 '15 at 23:52
  • 1. are you working on the exact data I've shown? 2. if not, you'll have to provide a link to your data for me to test, sorry can't help otherwise. – Arun Apr 09 '15 at 23:56
0

Try this

library(reshape2)
library(plyr)
df.melt <- melt(df, id.vars=c("Section", "ID0", "Totaltime"), variable.name="item.type", value.name="item.value")
df.mutate <- mutate(df.melt, item.no=gsub("(Item[0-9]+).*", "\\1", item.type), item.type=gsub("Item[0-9]+/", "", item.type)
df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) df.final <- ddply(df.mutate, .(Section, ID0, Totaltime, item.no), function(d) dcast(d, Section + ID0 + Totaltime ~ item.type, value.var="item.value", fun.aggregate=function(x) x[1]))
jwilley44
  • 173
  • 6
  • Seems to be getting there, but throws this error on the last part- `Error: unexpected string constant in "new.final <- ddply(new.mutate, .(section", participant, total_time, card.no), function(d){dcast(d, section + participant + total_time ~ card.type, value.var = ""` – user4759802 Apr 07 '15 at 19:15
  • It looks like you have a quote in the second argument of ddply – jwilley44 Apr 07 '15 at 19:18
  • Yeah, I caught it afterward. It still didn't work correctly... Gave me a big matrix, filled mostly with NA, but with actual values stepping down diagonally from the top as the item numbers in the columns increase, with a row for each column, if that makes sense. It's huge- took my computer like 10 minutes to spit it out, but not what I need, unfortunately. I could have messed up the code- I had to replace the value.var="value" to get it to do anything. – user4759802 Apr 07 '15 at 20:01
0

I think this gets the format you need:

dd <- data.frame(Section = 1:5, ID0 = 10001:10005, Totaltime = 1:5 * 100,
                 'Item1/Word' = '1/word', 'Item1/Cat' = '1/cat',
                 'Item1/Time' = '1/time',
                 'Item2/Word' = '2/word', 'Item2/Cat' = '2/cat',
                 'Item2/Time' = '2/time',
                 'Item3/Word' = '3/word', 'Item3/Cat' = '3/cat',
                 'Item3/Time' = '3/time', stringsAsFactors = FALSE,
                 check.names = FALSE)


#   Section   ID0 Totaltime Item1/Word Item1/Cat Item1/Time Item2/Word Item2/Cat Item2/Time Item3/Word Item3/Cat Item3/Time
# 1       1 10001       100     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 2       2 10002       200     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 3       3 10003       300     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 4       4 10004       400     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time
# 5       5 10005       500     1/word     1/cat     1/time     2/word     2/cat     2/time     3/word     3/cat     3/time

## define the varying columns:
keys <- paste0('Item', 1:3)
keys <- c('Word','Cat','Time')
l <- lapply(keys, function(x) grep(x, names(dd)))

rr <- reshape(dd, direction = 'long', varying = l)
rr <- rr[with(rr, order(Section, ID0, Totaltime)),
         ## `reshape` makes two extra variabes, time and id, we dont want
         -which(names(rr) %in% c('id','time'))]
rr[, 1:3] <- lapply(rr[, 1:3], function(x) ifelse(duplicated(x), '', x))
`rownames<-`(rr, NULL)

#    Section   ID0 Totaltime Item1/Word Item1/Cat Item1/Time
# 1        1 10001       100     1/word     1/cat     1/time
# 2                              2/word     2/cat     2/time
# 3                              3/word     3/cat     3/time
# 4        2 10002       200     1/word     1/cat     1/time
# 5                              2/word     2/cat     2/time
# 6                              3/word     3/cat     3/time
# 7        3 10003       300     1/word     1/cat     1/time
# 8                              2/word     2/cat     2/time
# 9                              3/word     3/cat     3/time
# 10       4 10004       400     1/word     1/cat     1/time
# 11                             2/word     2/cat     2/time
# 12                             3/word     3/cat     3/time
# 13       5 10005       500     1/word     1/cat     1/time
# 14                             2/word     2/cat     2/time
# 15                             3/word     3/cat     3/time
rawr
  • 20,481
  • 4
  • 44
  • 78