convert long dataset to wide dataset using R

Question

I'd appreciate some assistance in what R code to use in the following situation:

This is the top 11 rows of the dataset:

Sa1_main11  Sa1_main11_2
20401106101 20401106101 -
20401106101 21105128609 -
20401106101 21105128653
20601110501 20601110501
20601110501 20601110530
20601110501 20601110531
20601110501 20601110532
20601110501 20601110533
20601110501 20601110534
20601110501 20601110614
20601110502 20601110502

SA1s are a geographical unit used by the Australian Bureau of Statistics.

This file is a list of what SA1 are contiguous - column 1 is the base SA1, and the second column is the SA1 that adjoins the first SA1.

For example, take the first 3 rows

20401106101 adjoins itself
21105128609 adjoins 20401106101
21105128653 adjoins 20401106101

What I need to do is to produce a dataset where the first line is of the format

20401106101  21105128609  21105128653

I've tried reshape2 package, but the lack of row labels (which would all be identical) makes that not possible for me.

Edit - here is a link to what the data looks like

https://www.dropbox.com/s/tigqdevybskm1bs/Original.JPG

here is a link to what the top 3 rows should look like

https://www.dropbox.com/s/b2l36mry9ibfnfq/Destination.JPG

If you could provide the complete desired output based on the sample data, this would be very helpful. — talat, Aug 22 '14 at 12:03
Have now included links to pictures of larger original and output example. Also tried table() - that produces a very large matrix with each SA1 as a row and column, with what looks like a zero in each cell. — Graham, Aug 22 '14 at 12:25
Please don't include screenshots of data in questions. Copy the data into the question and format it. — Roland, Aug 22 '14 at 12:30

Roland · Answer 1 · 2014-08-22T12:37:38.793

It looks like split might help you:

split(DF[,2], DF[,1]) 

#$`20401106101`
#[1] 20401106101 21105128609 21105128653
#
#$`20601110501`
#[1] 20601110501 20601110530 20601110531 20601110532 20601110533 20601110534 20601110614
#
#$`20601110502`
#[1] 20601110502

It's unclear what you intend to do with the data. Neither data.frames nor matrices can hold rows of different length. So replicating the exact result is a bit complicated (and not very useful). Anyway, this would come close:

res <- split(DF[,2], DF[,1]) 
res <- lapply(res, function(x) {
  length(x) <- max(sapply(res, length))
  x
  })

do.call(rbind, res)
#                   [,1]        [,2]        [,3]        [,4]        [,5]        [,6]        #[,7]
#20401106101 20401106101 21105128609 21105128653          NA          NA          NA          NA
#20601110501 20601110501 20601110530 20601110531 20601110532 20601110533 20601110534 20601110614
#20601110502 20601110502          NA          NA          NA          NA          NA          NA

About the comment about rows of different length; the output file will be used with HLM software. I'll be doing a hierarchical linear model with two levels, where contiguity constraints are included. HLM software requires the file with the contiguity data to be in this format. — Graham, Aug 23 '14 at 06:42

score 0 · Accepted Answer · answered Aug 22 '14 at 12:34

Check if this works: (dat is the dataset)

 library(reshape2)
 dat$indx <- with(dat, ave(seq_along(Sa1_main11), Sa1_main11, FUN=seq_along))
 dcast(dat, Sa1_main11~indx, value.var="Sa1_main11_2")
 #     Sa1_main11           1           2           3           4           5
 #1 20401106101 20401106101 21105128609 21105128653          NA          NA
 #2 20601110501 20601110501 20601110530 20601110531 20601110532 20601110533
 #3 20601110502 20601110502          NA          NA          NA          NA
 #           6           7
 #1          NA          NA
 #2 20601110534 20601110614
 #3          NA          NA

Thanks Roland and akrun for your assistance. Both solutions resulted in identical output; akrun's solution was quite a bit quicker to run (although with only 69k rows in the input dataset, that wasn't much of an issue for me. I had to modify the output of both solutions in excel to remove the NAs, and I haven't yet tested whether that is going to be an issue for me. — Graham, Aug 23 '14 at 06:36

convert long dataset to wide dataset using R

2 Answers2