Hadley Wickham's Tidy Data paper, and the tidyr
package that is his (latest) implementation of its principles, is a great place to start.
The rough answer to the question is that data, during processing, should always be long, and should only be widened for display purposes. Be cautious with this, though, as here "long" refers more to "tidy", rather than the pure long form.
Examples
Take, for example, the mtcars
dataset. This is already in tidy form, in that each row represents a single observation. So "lengthening" it, to get something like this
model type value
1 AMC Javelin mpg 15.200
2 AMC Javelin cyl 8.000
3 AMC Javelin disp 304.000
4 AMC Javelin hp 150.000
5 AMC Javelin drat 3.150
6 AMC Javelin wt 3.435
is counterproductive; mpg
and cyl
are not comparable in any meaningful way.
Taking the ChickWeight
dataset (which is in long form) and transforming it to wide by time
require(tidyr)
ChickWeight %>% spread(Time, weight)
Chick Diet 0 2 4 6 8 10 12 14 16 18 20 21
1 18 1 39 35 NA NA NA NA NA NA NA NA NA NA
2 16 1 41 45 49 51 57 51 54 NA NA NA NA NA
3 15 1 41 49 56 64 68 68 67 68 NA NA NA NA
4 13 1 41 48 53 60 65 67 71 70 71 81 91 96
5 9 1 42 51 59 68 85 96 90 92 93 100 100 98
6 20 1 41 47 54 58 65 73 77 89 98 107 115 117
7 10 1 41 44 52 63 74 81 89 96 101 112 120 124
8 8 1 42 50 61 71 84 93 110 116 126 134 125 NA
9 17 1 42 51 61 72 83 89 98 103 113 123 133 142
10 19 1 43 48 55 62 65 71 82 88 106 120 144 157
11 4 1 42 49 56 67 74 87 102 108 136 154 160 157
12 6 1 41 49 59 74 97 124 141 148 155 160 160 157
13 11 1 43 51 63 84 112 139 168 177 182 184 181 175
...
gives a visualization that may be useful, but for data analysis purposes, is very inconvenient, as computing things like growth rate become cumbersome.