0

I have a dataset with two sets of observations for 830 and 930. My objective is to reshape my data frame so that there is one originID column, one 830 column, and one 930 column, keeping the values under walking in their respective hour column. It's basically a reverse reshape. Is there a quick method to do this in R and what package would be most appropriate?

> df
      originId   walking hour
1    359727104  3.440248  830
2    359931904  8.065233  830
3    229873828  3.519326  830
4    359931908 20.758961  830
5    359931909 15.050358  830
6    359727113  3.178191  830
1063 359727104  3.029167  930
1064 359931904  8.093116  930
1065 229873828  3.523732  930
1066 359931908 21.234964  930
1067 359931909 15.701993  930
1068 359727113  2.768297  930

I have tried this formula from reshape2 but it does not produce the correct result.

> dcast(df, formula = originId + walking ~ hour)
Using hour as value column: use value.var to override.
    originId   walking 830 930
1  229873828  3.519326 830  NA
2  229873828  3.523732  NA 930
3  359727104  3.029167  NA 930
4  359727104  3.440248 830  NA
5  359727113  2.768297  NA 930
6  359727113  3.178191 830  NA
7  359931904  8.065233 830  NA
8  359931904  8.093116  NA 930
9  359931908 20.758961 830  NA
10 359931908 21.234964  NA 930
11 359931909 15.050358 830  NA
12 359931909 15.701993  NA 930

Here is a sample of the data:

> dput(df)
structure(list(originId = c(359727104, 359931904, 229873828, 
359931908, 359931909, 359727113, 359727104, 359931904, 229873828, 
359931908, 359931909, 359727113), walking = c(3.44024822695035, 
8.06523297491039, 3.51932624113475, 20.7589605734767, 15.0503584229391, 
3.1781914893617, 3.02916666666667, 8.09311594202899, 3.52373188405797, 
21.2349637681159, 15.7019927536232, 2.76829710144928), hour = c(830L, 
830L, 830L, 830L, 830L, 830L, 930L, 930L, 930L, 930L, 930L, 930L
)), .Names = c("originId", "walking", "hour"), row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 1063L, 1064L, 1065L, 1066L, 1067L, 1068L), class = "data.frame")
iskandarblue
  • 7,208
  • 15
  • 60
  • 130
  • You need to use the `walking`variable as your `value.var`: `dcast(df, formula = originId ~ hour, value.var = 'walking')` – Jaap Jun 21 '16 at 15:11

2 Answers2

1

Try tidyr:

df %>% spread(hour, walking)

I would recommend changing the numbers like this using dplyr so you don't have to deal with column names starting with numbers:

df %>%
  mutate(hour = paste0('hour_', hour)) %>%
  spread(hour, walking)
Felipe Gerard
  • 1,552
  • 13
  • 23
  • I get the following error when trying this : Error in utils::getS3method("as_data_frame", "matrix", envir = asNamespace("tibble")) : unused argument (envir = asNamespace("tibble")) – – iskandarblue Jun 21 '16 at 15:17
  • Try restarting your R session and only loading the required packages. If it works, then there is a conflict. – Felipe Gerard Jun 21 '16 at 15:19
0

You need to use the walking-variable as your value.var:

dcast(df, originId ~ hour, value.var = 'walking')

which gives:

   originId       830       930
1 229873828  3.519326  3.523732
2 359727104  3.440248  3.029167
3 359727113  3.178191  2.768297
4 359931904  8.065233  8.093116
5 359931908 20.758961 21.234964
6 359931909 15.050358 15.701993

And probably even better:

dcast(df, originId ~ paste0('hr_',hour), value.var = 'walking')

which gives:

   originId    hr_830    hr_930
1 229873828  3.519326  3.523732
2 359727104  3.440248  3.029167
3 359727113  3.178191  2.768297
4 359931904  8.065233  8.093116
5 359931908 20.758961 21.234964
6 359931909 15.050358 15.701993
Jaap
  • 81,064
  • 34
  • 182
  • 193