I am trying to pivot a table that has headings and sub-headings, so that the headings go into a column "date", and the subheadings are two columns instead of repeating.
Here is an example of my data.
This was produced using dput()
, so while in the original excel file, each date spanned over both sub-headings ("blue" and "green"), once in R, these blank cells were re-named X.1, X.2, etc.
table <- " X X.1 X02.Jul.12 X.2 X03.Jul.12 X.3 X04.Jul.12 X.4
1 category number blue green blue green blue green
2 G 1 1 0 1 0 1 0
3 G 2 2 99 2 99 1 99
4 G 3 1 1 1 99 1 99
5 G 4 1 1 1 1 2 99
6 G 5 1 0 1 0 1 99
7 G 6 1 99 1 1 1 99
8 G 7 1 0 1 0 1 0
9 G 8 1 1 1 1 1 99
10 G 9 1 1 1 1 1 1
11 H 1 1 1 1 1 1 1
12 H 2 1 99 1 0 1 0
13 H 3 1 1 1 1 1 99
14 H 4 1 99 1 2 1 99
15 H 5 1 1 1 1 1 1
16 H 6 1 0 1 0 1 99
17 H 7 1 1 2 1 1 99
18 H 8 2 0 2 0 1 1
19 H 9 2 0 2 0 1 1"
#Create a dataframe with the above table
df <- read.table(text=table, header = TRUE)
df
Here is an example of what it looks like in Excel:
This is the desired output I am trying to achieve:
While this can be done manually in Excel, I have multiple files with over 100 dates/columns, so would prefer to find a way to clean it in R.
Any help would be appreciated!
Excel Reprex
Here is a reprex of the dataset, as if it were read from Excel without name correction:
# Define the dataset.
df_excel <- structure(
list(
c("category", "G", "G", "G", "G", "G", "G", "G", "G", "G", "H", "H", "H", "H", "H", "H", "H", "H", "H"),
c("number", "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9"),
`02.Jul.12` = c("blue", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2"),
c("green", "0", "99", "1", "1", "0", "99", "0", "1", "1", "1", "99", "1", "99", "1", "0", "1", "0", "0"),
`03.Jul.12` = c("blue", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2"),
c("green", "0", "99", "99", "1", "0", "1", "0", "1", "1", "1", "0", "1", "2", "1", "0", "1", "0", "0"),
`04.Jul.12` = c("blue", "1", "1", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
c("green", "0", "99", "99", "99", "99", "99", "0", "99", "1", "1", "0", "99", "99", "1", "99", "99", "1", "1")
),
class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19")
)
# Save dataset in Excel file ('reprex.xlsx') for reproducibility.
openxlsx::write.xlsx(x = df_excel, file = "./reprex.xlsx")