How to take variables in a column and make them into numerous columns

Question

So I have this dataset which I have been cleaning for someone else but they want a specific column made into several columns by type of observation. For example this is a column of diagnoses and she wants this column to be expanded so it is one column for one diagnosis, and another for a different diagnosis. Thus I column with Depression, ADHD, Asthma, Cancer etc would be expanded to one column called depression, one called ADHD etc etc.

I'm pretty sure this violates the principles of tidy data, but the person I am doing this for is adamant this is the way they want it done. So I have tried looking at the tidyr and dplyr packages but so far I am having no luck and could use some advice.

Thanks for your help in advance

   Order Diagnosis

1   1   Synaesthesia
2   1   Synaesthesia
3   1   Synaesthesia
4   1   Synaesthesia
5   1   Synaesthesia
6   1   Synaesthesia
7   1   ADHD
8   1   ADHD
9   1   ADHD
10  1   ADHD
11  1   ADHD
12  1   ADHD
13  1   ADHD
14  1   ADHD
15  1   ADHD
16  1   ADHD
17  1   ADHD
18  1   ADHD
19  1   ADHD
20  1   ADHD
21  1   ADHD
22  1   ADHD
23  1   ADHD
24  1   ADHD
25  1   ADHD
26  1   ADHD
27  1   ADHD
28  1   ADHD
29  1   ADHD
30  1   ADHD
31  1   ADHD
32  1   ADHD
33  1   ADHD
34  1   ADHD
35  1   ADHD
36  1   ADHD
37  1   ADHD

You may want to look at `reshape2` package and convert the data from long to wide form. — Metrics, Feb 14 '15 at 14:28
Thanks for your comment. Can you expand a little on which specific functions in reshape2 I should use please? — googleplex101, Feb 14 '15 at 14:49
See here: http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/. Alternatively, you can use `tidyr` package: http://blog.rstudio.org/2014/07/22/introducing-tidyr/ — Metrics, Feb 14 '15 at 14:57
Oh dear, I don't seem to be able to get the hang of this. Can you give me an example please? It just doesn't seem to be working for me sorry. — googleplex101, Feb 14 '15 at 15:21
so far what I have is this: data3<-dcast(data2, .~ diagnosis) which returns a data frame containing numeric values for how often each diagnosis is present in the data but what I want is a series of columns by diagnosis with the diagnosis strings as variables, located on the rows where the subject has that diagnosis. — googleplex101, Feb 14 '15 at 15:27
can you label the column name properly? Also, do you have only two columns? — Metrics, Feb 14 '15 at 15:55
Named the columns correctly. No I have 63 columns, but I am worried about revealing any more as it is not my data. — googleplex101, Feb 14 '15 at 16:00

score 1 · Answer 1 · answered Feb 14 '15 at 16:05

It's not entirely clear what your expected results are, but one interpretation is that you are looking to recode your data, e.g. by using dummy coding.

A simple way to do this is to use model.matrix(). Try this:

model.matrix(~ Diagnosis - 1, dat)

   DiagnosisADHD DiagnosisSynaesthesia
1              0                     1
2              0                     1
3              0                     1
4              0                     1
5              0                     1
6              0                     1
7              1                     0
8              1                     0
9              1                     0
10             1                     0
...

This looks the closest to what I need, thank you. Is there any way to change the 1's to the diagnosis name, like ADHD? — googleplex101, Feb 14 '15 at 16:23

score 0 · Accepted Answer · answered Feb 15 '15 at 20:30

You could split your "vector" (or column in your case), pad it with NAs and cbind it into a fully pledged data.frame or matrix.

x <- sample(LETTERS[1:5], size = 100, replace = TRUE)
sx <- split(x, x)

ml <- max(unlist(lapply(sx, length)))

# pad the data with NAs
do.call("cbind", lapply(sx, FUN = function(m) c(m, rep(NA, ml - length(m)))))

      A   B   C   D   E  
 [1,] "A" "B" "C" "D" "E"
 [2,] "A" "B" "C" "D" "E"
 [3,] "A" "B" "C" "D" "E"
 [4,] "A" "B" "C" "D" "E"
 [5,] "A" "B" "C" "D" "E"
 [6,] "A" "B" "C" "D" "E"
 [7,] "A" "B" "C" "D" "E"
 [8,] "A" "B" "C" "D" "E"
 [9,] "A" "B" "C" "D" "E"
[10,] "A" "B" "C" "D" "E"
[11,] "A" "B" "C" "D" "E"
[12,] "A" "B" "C" "D" "E"
[13,] "A" "B" "C" "D" "E"
[14,] "A" "B" "C" "D" "E"
[15,] NA  "B" "C" "D" "E"
[16,] NA  "B" "C" "D" "E"
[17,] NA  "B" "C" "D" "E"
[18,] NA  "B" "C" "D" "E"
[19,] NA  "B" "C" "D" "E"
[20,] NA  "B" "C" "D" "E"
[21,] NA  "B" "C" "D" NA 
[22,] NA  NA  "C" "D" NA 
[23,] NA  NA  NA  "D" NA

How to take variables in a column and make them into numerous columns

2 Answers2