R melting/gathering file with 2 heading rows into attributes

Question

I need to read in a datafile with 2 levels of headers, the data looks like this:

|          | Jone Doe |      |      |      |      |      |      | Jane Doe |      |      |      |      |      |      | 
|----------|----------|------|------|------|------|------|------|----------|------|------|------|------|------|------| 
| Date     | Col1     | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col1     | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | 
| 1-Jul-13 | 49       | 42   | 20   | 18   | 23   | 16   | 29   | 48       | 33   | 24   | 10   | 43   | 13   | 43   | 
| 2-Jul-13 | 17       | 16   | 43   | 33   | 37   | 37   | 10   | 7        | 45   | 19   | 4    | 41   | 41   | 20   | 
| 3-Jul-13 | 35       | 39   | 42   | 35   | 5    | 12   | 22   | 3        | 28   | 23   | 10   | 12   | 5    | 8    |

I need it to look like this:

| Date     | Name     | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | 
|----------|----------|------|------|------|------|------|------|------| 
| 1-Jul-13 | Jone Doe | 49   | 42   | 20   | 18   | 23   | 16   | 29   | 
| 2-Jul-13 | Jone Doe | 17   | 16   | 43   | 33   | 37   | 37   | 10   | 
| 3-Jul-13 | Jone Doe | 35   | 39   | 42   | 35   | 5    | 12   | 22   | 
| 1-Jul-13 | Jane Doe | 48   | 33   | 24   | 10   | 43   | 13   | 43   | 
| 2-Jul-13 | Jane Doe | 7    | 45   | 19   | 4    | 41   | 41   | 20   | 
| 3-Jul-13 | Jane Doe | 3    | 28   | 23   | 10   | 12   | 5    | 8    |

Any idea on how to do this without hard-coding? I've been trying using melt() and gather() without any luck

Edit:

Sample data : https://drive.google.com/open?id=1T4KkAk5D55_nXsHlr1Aozed6d49qFM_8

Output of lst1:

output of nm1:

 [1] "John Doe"  "John Doe"  "John Doe"  "John Doe"  "John Doe"  "John Doe"  "John Doe"  "Jane Doe" 
 [9] "Jane Doe"  "Jane Doe"  "Jane Doe"  "Jane Doe"  "Jane Doe"  "Jane Doe"  "Jose Doe"  "Jose Doe" 
[17] "Jose Doe"  "Jose Doe"  "Jose Doe"  "Jose Doe"  "Jose Doe"  "Jacob Doe" "Jacob Doe" "Jacob Doe"
[25] "Jacob Doe" "Jacob Doe" "Jacob Doe" "Jacob Doe"

Looks like your 'nm1' is right. Let me check with your data – akrun Jul 26 '19 at 04:02 — akrun, Jul 26 '19 at 04:02

akrun · Accepted Answer · 2019-07-26T04:13:40.493

2

An option would be to read the dataset with skip argument to skip the 1st row, then, we can split the data into a list based on the duplicate column names and then create the 'Name' column for each list element based on the first row and rbind the list elements to create a single data.frame

dat1 <- read.csv("file.csv", header = TRUE, skip = 1, 
          stringsAsFactors = FALSE, na.strings = "N/A")
nm1 <- c("John Doe", "Jane Doe")[cumsum(grepl("Col1", names(dat1)[-1]))]
 nm2 <- unique( sub("\\.\\d+$", "", names(dat1)[-1]))
lst1 <- split.default(dat1[-1], nm1)
dat2 <- cbind(dat1['Date'], do.call(rbind, Map(cbind, Name = nm1,  lapply(lst1, setNames, nm2))))
row.names(dat2) <- NULL
head(dat2, 5)
#      Date     Name Col1 Col2 Col3 Col4 Col5 Col6 Col7
#1 1-Jul-13 John Doe   52    6   NA   NA    7   20   25
#2 2-Jul-13 John Doe   43    7   NA   NA   NA   25   17
#3 3-Jul-13 John Doe   55    5   NA   NA    4   23   28
#4 4-Jul-13 John Doe   42    6   NA   NA    7   21   14
#5 5-Jul-13 John Doe   64    3   NA   NA    5   36   22

dim(dat2)
#[1] 140   9

Note that if the number of blocks of columns are large, an option is to read the first line with readLines

v1 <- readLines("file.csv", n = 1)
v2 <- scan(text = gsub(",{2,}", ",", trimws(v1)), sep=",", what = "", quiet = TRUE)
v3 <- v2[nzchar(v2)]

and feed it to the cumsum step

nm1 <- v3[cumsum(grepl("Col1", names(dat1)[-1]))]

edited Jul 26 '19 at 04:13

answered Jul 26 '19 at 03:22

akrun

874,273
37
540
662

Thanks for your reply, i tried this and get the following error ```Error in match.names(clabs, names(xi)) : names do not match previous names``` on this line ```dat2 <- cbind(dat1['Date'], do.call(rbind, Map(cbind, Name = nm1, lst1)))```. Inspecting lst1 i think it is nearly there – zaza Jul 26 '19 at 03:44
It looks like the names in ```lst1``` have a '.1', '.2','.3', etc... appended onto it – zaza Jul 26 '19 at 03:48
@zaza Can you update your post with the `lst1` as `dput` for the small example. As I don't have the correct structure, it may be some minor issue – akrun Jul 26 '19 at 03:48
@zaza What is the output for 'nm1' – akrun Jul 26 '19 at 03:48
I've updated the question with sample data and requested info – zaza Jul 26 '19 at 04:00
@zaza Okay, I understand the issue – akrun Jul 26 '19 at 04:11

R melting/gathering file with 2 heading rows into attributes

1 Answers1