0

I have this dataframe:

df <- data.frame(subject = c(rep("one", 20), c(rep("two", 20))),
                 score1 = sample(1:3, 40, replace=T),
                 score2 = sample(1:6, 40, replace=T),
                 score3 = sample(1:3, 40, replace=T),
                 score4 = sample(1:4, 40, replace=T))

   subject score1 score2 score3 score4
1      one      2      4      2      2
2      one      3      3      1      2
3      one      1      2      1      3
4      one      3      4      1      2
5      one      1      2      2      3
6      one      1      5      2      4
7      one      2      5      3      2
8      one      1      5      1      3
9      one      3      5      2      2
10     one      2      3      3      4
11     one      3      2      1      3
12     one      2      5      2      1
13     one      2      4      1      4
14     one      2      2      1      3
15     one      1      3      1      4
16     one      1      6      1      3
17     one      3      4      2      2
18     one      3      2      1      3
19     one      2      5      3      1
20     one      3      6      2      1
21     two      1      6      3      4
22     two      1      2      1      2
23     two      3      2      1      2
24     two      1      2      2      1
25     two      2      3      1      3
26     two      1      5      3      3
27     two      2      4      1      4
28     two      2      6      2      4
29     two      1      6      2      2
30     two      1      5      1      4
31     two      2      1      2      4
32     two      3      6      1      1
33     two      1      1      3      1
34     two      2      4      2      3
35     two      2      1      3      2
36     two      2      3      1      3
37     two      1      2      3      4
38     two      3      5      2      2
39     two      2      1      3      4
40     two      2      1      1      3

Note that the scores have different ranges of values. Score 1 ranges from 1-3, score 2 from -6, score 3 from 1-3, score 4 from 1-4

I'm trying to reshape data like this:

library(reshape2)
dfMelt <- melt(df, id.vars="subject")

acast(dfMelt, subject ~ value ~ variable)

Aggregation function missing: defaulting to length
, , score1

    1 2 3 4 5 6
one 6 7 7 0 0 0
two 8 9 3 0 0 0

, , score2

    1 2 3 4 5 6
one 0 5 3 4 6 2
two 5 4 2 2 3 4

, , score3

     1 2 3 4 5 6
one 10 7 3 0 0 0
two  8 6 6 0 0 0

, , score4

    1 2 3 4 5 6
one 3 6 7 4 0 0
two 3 5 5 7 0 0

Note that the output array includes scores as "0" if they are missing. Is there any way to stop these missing scores being outputted by acast?

luciano
  • 13,158
  • 36
  • 90
  • 130

2 Answers2

1

In this case, you might do better sticking to base R's table feature. I'm not sure that you can have an irregular array like you are looking for.

For example:

> lapply(df[-1], function(x) table(df[[1]], x))
$score1
     x
       1  2  3
  one  9  6  5
  two 11  4  5

$score2
     x
      1 2 3 4 5 6
  one 2 5 4 3 3 3
  two 4 2 2 3 4 5

$score3
     x
       1  2  3
  one  9  5  6
  two  4 11  5

$score4
     x
      1 2 3 4
  one 4 4 8 4
  two 2 6 5 7

Or, using your "long" data:

with(dfMelt, by(dfMelt, variable, 
                FUN = function(x) table(x[["subject"]], x[["value"]])))
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

Since each "score" subset is going to have a different shape, you will not be able to preserve the array structure. One option is to use lists of two-dim arrays or data.frames. eg:

# your original acast call
res  <-  acast(dfMelt, subject ~ value ~ variable)

# remove any columns that are all zero
apply(res, 3, function(x) x[, apply(x, 2, sum)!=0] )

Which gives:

$score1
    1 2 3
one 7 8 5
two 6 8 6

$score2
    1 2 3 4 5 6
one 4 2 6 4 1 3
two 2 5 3 4 3 3

$score3
    1  2 3
one 5 10 5
two 5 11 4

$score4
    1 2 3 4
one 5 4 4 7
two 4 6 6 4
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178