0

I am dealing with a dataset that is in wide format, as in

> data=read.csv("http://www.kuleuven.be/bio/ento/temp/data.csv")
> data
  factor1 factor2 count_1 count_2 count_3
1       a       a       1       2       0
2       a       b       3       0       0
3       b       a       1       2       3
4       b       b       2       2       0
5       c       a       3       4       0
6       c       b       1       1       0

where factor1 and factor2 are different factors which I would like to take along (in fact I have more than 2, but that shouldn't matter), and count_1 to count_3 are counts of aggressive interactions on an ordinal scale (3>2>1). I would now like to convert this dataset to long format, to get something like

   factor1 factor2 aggression
1        a       a          1
2        a       a          2
3        a       a          2
4        a       b          1
5        a       b          1
6        a       b          1
7        b       a          1
8        b       a          2
9        b       a          2
10       b       a          3
11       b       a          3
12       b       a          3
13       b       b          1
14       b       b          1
15       b       b          2
16       b       b          2
17       c       a          1
18       c       a          1
19       c       a          1
20       c       a          2
21       c       a          2
22       c       a          2
23       c       a          2
24       c       b          1
25       c       b          2

Would anyone happen to know how to do this without using for...to loops, e.g. using package reshape2? (I realize it should work using melt, but I just haven't been able to figure out the right syntax yet)

Edit: For those of you that would also happen to need this kind of functionality, here is Ananda's answer below wrapped into a little function:

    widetolong.ordinal<-function(data,factors,responses,responsename) {
    library(reshape2)
    data$ID=1:nrow(data) # add an ID to preserve row order
    dL=melt(data, id.vars=c("ID", factors)) # `melt` the data
    dL=dL[order(dL$ID), ] # sort the molten data
    dL[,responsename]=match(dL$variable,responses) # convert reponses to ordinal scores
    dL[,responsename]=factor(dL[,responsename],ordered=T)
    dL=dL[dL$value != 0, ] # drop rows where `value == 0`
    out=dL[rep(rownames(dL), dL$value), c(factors, responsename)] # use `rep` to "expand" `data.frame` & drop unwanted columns
    rownames(out) <- NULL
    return(out)
    }

    # example
    data <- read.csv("http://www.kuleuven.be/bio/ento/temp/data.csv")
    widetolong.ordinal(data,c("factor1","factor2"),c("count_1","count_2","count_3"),"aggression")
Tom Wenseleers
  • 7,535
  • 7
  • 63
  • 103
  • 1
    Have a look at `melt` in `reshape2` package. Work through the simple examples [here](http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/), and I am sure you will be able to identify corresponding `id.vars` and `measure.vars` in your own data. – Henrik Sep 30 '13 at 17:16
  • I think the only thing I use SPSS for, is to do exactly this. – PascalVKooten Sep 30 '13 at 18:00
  • @AnandaMahto: sorry for my somewhat long example - I have now edited it to give a more abstract and simple example - and thanks Henrik for the pointer - I realized that melt was probably the way to go, but still haven't figured out the right syntax though for my purpose... – Tom Wenseleers Sep 30 '13 at 22:15
  • You are much more likely to receive help if you show us what you have tried. Again, please look at [the example I suggested](http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/#from-wide-to-long). The data is very similar to your data: two factors (subject sex vs factor1 factor2) and three measurements (control cond1 cond2 vs. count_1 count_2 count_3). – Henrik Sep 30 '13 at 23:01
  • @Henrik, it is a *similar* problem, but this question needs a little bit more than `melt`. – A5C1D2H2I1M1N2O1R2T1 Oct 01 '13 at 01:47
  • @Dualinity, Does SPSS have a special function for doing something like this? Just curious. :) – A5C1D2H2I1M1N2O1R2T1 Oct 01 '13 at 01:47
  • @AnandaMahto, I entirely agree. I was just trying to encourage OP to get started and show us the code he had tried. Great answer, +1! – Henrik Oct 01 '13 at 07:11
  • @AnandaMahto Yes, SPSS does have a simple GUI solution to it. – PascalVKooten Oct 01 '13 at 09:42
  • @Dualinity, can you share where it is. I'm curious to see how it is implemented and the corresponding code that would be required if being done manually. – A5C1D2H2I1M1N2O1R2T1 Oct 01 '13 at 09:45
  • @AnandaMahto Due to changing to Ubuntu, I currently do not have it installed. I remember it is an easy to spot menu option, though. – PascalVKooten Oct 01 '13 at 09:49

1 Answers1

2

melt from "reshape2" will only get you part of the way through this problem. To go the rest of the way, you just need to use rep from base R:

data <- read.csv("http://www.kuleuven.be/bio/ento/temp/data.csv")
library(reshape2)

## Add an ID if the row order is importantt o you
data$ID <- 1:nrow(data)

## `melt` the data
dL <- melt(data, id.vars=c("ID", "factor1", "factor2"))

## Sort the molten data, if necessary
dL <- dL[order(dL$ID), ]

## Extract the numeric portion of the "variable" variable
dL$aggression <- gsub("count_", "", dL$variable)

## Drop rows where `value == 0`
dL <- dL[dL$value != 0, ]

## Use `rep` to "expand" your `data.frame`.
## Drop any unwanted columns at this point.
out <- dL[rep(rownames(dL), dL$value), c("factor1", "factor2", "aggression")]

This is what the output finally looks like. If you want to remove the funny row names, just use rownames(out) <- NULL.

out
#      factor1 factor2 aggression
# 1          a       a          1
# 7          a       a          2
# 7.1        a       a          2
# 2          a       b          1
# 2.1        a       b          1
# 2.2        a       b          1
# 3          b       a          1
# 9          b       a          2
# 9.1        b       a          2
# 15         b       a          3
# 15.1       b       a          3
# 15.2       b       a          3
# 4          b       b          1
# 4.1        b       b          1
# 10         b       b          2
# 10.1       b       b          2
# 5          c       a          1
# 5.1        c       a          1
# 5.2        c       a          1
# 11         c       a          2
# 11.1       c       a          2
# 11.2       c       a          2
# 11.3       c       a          2
# 6          c       b          1
# 12         c       b          2
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485