6

I am trying to pivot a table that has headings and sub-headings, so that the headings go into a column "date", and the subheadings are two columns instead of repeating.

Here is an example of my data.

This was produced using dput(), so while in the original excel file, each date spanned over both sub-headings ("blue" and "green"), once in R, these blank cells were re-named X.1, X.2, etc.

table <- "          X    X.1 X02.Jul.12   X.2 X03.Jul.12   X.3 X04.Jul.12   X.4
1  category number       blue green       blue green       blue green
2         G      1          1     0          1     0          1     0
3         G      2          2    99          2    99          1    99
4         G      3          1     1          1    99          1    99
5         G      4          1     1          1     1          2    99
6         G      5          1     0          1     0          1    99
7         G      6          1    99          1     1          1    99
8         G      7          1     0          1     0          1     0
9         G      8          1     1          1     1          1    99
10        G      9          1     1          1     1          1     1
11        H      1          1     1          1     1          1     1
12        H      2          1    99          1     0          1     0
13        H      3          1     1          1     1          1    99
14        H      4          1    99          1     2          1    99
15        H      5          1     1          1     1          1     1
16        H      6          1     0          1     0          1    99
17        H      7          1     1          2     1          1    99
18        H      8          2     0          2     0          1     1
19        H      9          2     0          2     0          1     1"

#Create a dataframe with the above table
df <- read.table(text=table, header = TRUE)
df

Here is an example of what it looks like in Excel:

Current data

This is the desired output I am trying to achieve:

Desired output

While this can be done manually in Excel, I have multiple files with over 100 dates/columns, so would prefer to find a way to clean it in R.

Any help would be appreciated!

Excel Reprex

Here is a reprex of the dataset, as if it were read from Excel without name correction:

# Define the dataset.
df_excel <- structure(
  list(
    c("category", "G", "G", "G", "G", "G", "G", "G", "G", "G", "H", "H", "H", "H", "H", "H", "H", "H", "H"),
    c("number", "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9"),
    `02.Jul.12` = c("blue", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2"),
    c("green", "0", "99", "1", "1", "0", "99", "0", "1", "1", "1", "99", "1", "99", "1", "0", "1", "0", "0"),
    `03.Jul.12` = c("blue", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2"),
    c("green", "0", "99", "99", "1", "0", "1", "0", "1", "1", "1", "0", "1", "2", "1", "0", "1", "0", "0"),
    `04.Jul.12` = c("blue", "1", "1", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
    c("green", "0", "99", "99", "99", "99", "99", "0", "99", "1", "1", "0", "99", "99", "1", "99", "99", "1", "1")
  ),
  class = "data.frame",
  row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19")
)

# Save dataset in Excel file ('reprex.xlsx') for reproducibility.
openxlsx::write.xlsx(x = df_excel, file = "./reprex.xlsx")
Greg
  • 3,054
  • 6
  • 27
cgxytf
  • 421
  • 4
  • 11
  • Hi, I deleted it when it was removed. Someone reported it as a duplicate and it was removed - I wouldn't have deleted it otherwise! Now that it has been re-opened, I can un-delete it! – cgxytf Jan 06 '22 at 19:56
  • Duplicates aren't removed - they serve as pointers to make the "originals" easier to find. No need to delete duplicates - they are useful! – Gregor Thomas Jan 06 '22 at 20:12
  • @jl748795 I updated [my solution](https://stackoverflow.com/a/70600230) to handle Excel dates. Any chance you could [supply a `.xlsx` *file*](https://meta.stackexchange.com/q/47689) as a reprex? – Greg Jan 10 '22 at 21:36

6 Answers6

4
df %>%
  set_names(enframe(unlist(df[1,])) %>%
  mutate(name = na_if(name, ''))%>%
  fill(name)%>%
  transmute(nms = coalesce(str_c(name, value, sep='_'), value)) %>%
  pull(nms)) %>%
  slice(-1)%>%
  type.convert(as.is = TRUE)%>%
  pivot_longer(-c(category, number), names_to = c('Date', '.value'), 
               names_sep = '_', names_transform = list(Date = dmy)) %>%
  arrange(category, Date, number)

# A tibble: 54 x 5
   category number Date        blue green
   <chr>     <int> <date>     <int> <int>
 1 G             1 2012-07-02     1     0
 2 G             2 2012-07-02     2    99
 3 G             3 2012-07-02     1     1
 4 G             4 2012-07-02     1     1
 5 G             5 2012-07-02     1     0
 6 G             6 2012-07-02     1    99
 7 G             7 2012-07-02     1     0
 8 G             8 2012-07-02     1     1
 9 G             9 2012-07-02     1     1
10 G             1 2012-07-03     1     0
# ... with 44 more rows
Onyambu
  • 67,392
  • 3
  • 24
  • 53
3

Here is another option using a combination of base R and tidyverse. Here, I first clean up the column names by including the date in the column names for the column to the left (i.e., "green), so that each column has a date. Then, I concatenate the header with the sub-heading, except for the first 2 columns (i.e., category and number). Then, I remove the first row and pivot to a long format with the date in one column and the colors stay in their own column.

library(tidyverse)

colnames(df)[seq(2, ncol(df), 2)] <- colnames(df)[seq(1, ncol(df), 2)]

colnames(df) <-
  c(df[1, 1], df[1, 2], paste(sep = '_', colnames(df)[3:ncol(df)], as.character(unlist(df[1, 3:ncol(df)]))))

df %>%
  slice(-1) %>%
  pivot_longer(-c(category, number),
               names_to = c("Date", ".value"),
               names_sep = "_") %>%
  arrange(Date, category, number) %>%
  mutate(Date = dmy(Date))

Output

# A tibble: 54 × 5
   category number Date       blue  green
   <chr>    <chr>  <date>     <chr> <chr>
 1 G        1      2012-07-02 1     0    
 2 G        2      2012-07-02 2     99   
 3 G        3      2012-07-02 1     1    
 4 G        4      2012-07-02 1     1    
 5 G        5      2012-07-02 1     0    
 6 G        6      2012-07-02 1     99   
 7 G        7      2012-07-02 1     0    
 8 G        8      2012-07-02 1     1    
 9 G        9      2012-07-02 1     1    
10 H        1      2012-07-02 1     1    
# … with 44 more rows

Data

df <- structure(
  list(
    c("category", "G", "G", "G", "G", "G", "G", "G", "G", "G", "H", "H", "H", "H", "H", "H", "H", "H", "H"),
    c("number", "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9"),
    `02.Jul.12` = c("blue", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2"),
    c("green", "0", "99", "1", "1", "0", "99", "0", "1", "1", "1", "99", "1", "99", "1", "0", "1", "0", "0"),
    `03.Jul.12` = c("blue", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2"),
    c("green", "0", "99", "99", "1", "0", "1", "0", "1", "1", "1", "0", "1", "2", "1", "0", "1", "0", "0"),
    `04.Jul.12` = c("blue", "1", "1", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
    c("green", "0", "99", "99", "99", "99", "99", "0", "99", "1", "1", "0", "99", "99", "1", "99", "99", "1", "1")
  ),
  class = "data.frame",
  row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19")
)

If you have additional columns (e.g., colors), then you can adjust how colnames is replaced. So, I first create a sequence (a) that starts with the first date column (i.e., 3) to the last column. Then, I create 2 sequences from a, where b has the the index for the empty column names (excluding the first 2 columns) and c has the date column names. Then, I replicate the dates twice so that they can replace the 2 empty column names (green and red) for each date. Then, in the next step, I just copy the first two column names (i.e., category and number), then paste the remaining headings (i.e., dates) to the sub-headings. Then, the process is the same as above.

a <- seq(3, ncol(df2))
b <- a[!(a%%3==0)]
c <- a[(a%%3==0)]

colnames(df2)[b] <- colnames(df2)[sort(rep(c, 2))]

colnames(df2) <-
  c(df2[1, 1], df2[1, 2], paste(sep = '_', colnames(df2)[3:ncol(df2)], as.character(unlist(df2[1, 3:ncol(df2)]))))

df2 %>%
  slice(-1) %>%
  pivot_longer(-c(category, number),
               names_to = c("Date", ".value"),
               names_sep = "_") %>%
  arrange(Date, category, number) %>%
  mutate(Date = lubridate::dmy(Date))

Output

# A tibble: 54 × 6
   category number Date       blue  green red  
   <chr>    <chr>  <date>     <chr> <chr> <chr>
 1 G        1      2012-07-02 1     0     1    
 2 G        2      2012-07-02 2     99    2    
 3 G        3      2012-07-02 1     1     1    
 4 G        4      2012-07-02 1     1     1    
 5 G        5      2012-07-02 1     0     1    
 6 G        6      2012-07-02 1     99    1    
 7 G        7      2012-07-02 1     0     1    
 8 G        8      2012-07-02 1     1     1    
 9 G        9      2012-07-02 1     1     1    
10 H        1      2012-07-02 1     1     1    
# … with 44 more rows

Data

df2 <- structure(
    list(
      c("category", "G", "G", "G", "G", "G", "G", "G", "G", "G", "H", "H", "H", "H", "H", "H", "H", "H", "H"),
      c("number", "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9"),
      `02.Jul.12` = c("blue", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2"),
      c("green", "0", "99", "1", "1", "0", "99", "0", "1", "1", "1", "99", "1", "99", "1", "0", "1", "0", "0"),
      c("red", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2"),
      `03.Jul.12` = c("blue", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2"),
      c("green", "0", "99", "99", "1", "0", "1", "0", "1", "1", "1", "0", "1", "2", "1", "0", "1", "0", "0"),
      c("red", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2"),
      `04.Jul.12` = c("blue", "1", "1", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
      c("green", "0", "99", "99", "99", "99", "99", "0", "99", "1", "1", "0", "99", "99", "1", "99", "99", "1", "1"),
      c("red", "1", "2", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2")
    ),
    class = "data.frame",
    row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19")
  )
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • 1
    Thank you! If I were to adapt this for more colours (e.g. red, blue and green), how would I achieve this? I've tried understanding the process that you followed to adjust it (e.g. changing ncol(df), 2 to ncol(df), 3 but did not have success. – cgxytf Jan 06 '22 at 15:04
  • @jl748795 No problem! I've added an update to show how to do it for additional columns (and included a `red` column in the example). – AndrewGB Jan 06 '22 at 23:29
  • 1
    Thank you so much! I assume this will work once I figure out the error it's giving me with my large dataset (Error: Can't transform a data frame with duplicate names). Thanks for taking the time to reply! – cgxytf Jan 07 '22 at 21:14
2

The following code should produce your desired output, though others may have more elegant solutions:

#Create a dataframe with the above table
df <- read.table(text=table, header = TRUE)
df

names(df) <- df[1,]
library(lubridate); library(tidyr)

startdate <- dmy("02-Jul-12")
for (i in seq(3, ncol(df), by = 2)){
  names(df)[i:(i+1)] <- paste0(startdate, ":", names(df)[i:(i+1)])
  startdate <- startdate+1
}

df.tdy <- df[-1,] %>% pivot_longer(-c("category","number"), names_to = "datecol", values_to = "value") %>% 
  separate(datecol, c("date","color"), sep = ":") %>%
  pivot_wider(names_from = "color") %>%
    arrange(date,category,number)

# category number date       blue  green
# <chr>    <chr>  <chr>      <chr> <chr>
#   1 G        1      2012-07-02 1     0    
# 2 G        2      2012-07-02 2     99   
# 3 G        3      2012-07-02 1     1    
# 4 G        4      2012-07-02 1     1    
# 5 G        5      2012-07-02 1     0    
# 6 G        6      2012-07-02 1     99   
# 7 G        7      2012-07-02 1     0    
# 8 G        8      2012-07-02 1     1    
# 9 G        9      2012-07-02 1     1    
# 10 H        1      2012-07-02 1     1   

jpsmith
  • 11,023
  • 5
  • 15
  • 36
2

Here's a tidyverse solution that can handle duplicate column names (like blue) yet doesn't rely on splicing those names:

Solution

First import the tidyverse and locate the Excel file:

# Load the tidyverse.
library(tidyverse)


# Filepath to the Excel file.
filepath <- "reprex.xlsx"

Then read the Excel file in three relevant pieces: the date row (topmost), the header (with duplicate names), and the dataset.

# Extract the date row and fill in the blanks.
dates <- readxl::read_excel(path = filepath, col_names = FALSE, skip = 0, n_max = 1) %>%
  # Convert everything to dates where possible; leave blanks (NAs) elsewhere.
  mutate(across(.cols = everything(), .fns = lubridate::as_datetime)) %>%
  # Treat date row as a column.
  as.double() %>% lubridate::as_datetime() %>% as_tibble() %>%
  # Fill in the blanks with the preceding dates.
  fill(1, .direction = "down") %>%
  # Treat the result as a vector of dates.
  .[[1]]


# Extract the header...
names <- readxl::read_excel(path = filepath, col_names = FALSE, skip = 1, n_max = 1) %>%
  # ...as a vector of column names (with duplicates).
  as.character()


# Extract the (unnamed) dataset.
df <- readxl::read_excel(path = filepath, col_names = FALSE, skip = 2, n_max = Inf)

Finally, use this workflow to properly name and pivot the data.

# Cut out the headers from the data.
df <- df %>%
  # Properly name the dataset.
  set_names(nm = names) %>%
  
  # Pivot the color columns.
  pivot_longer(cols = !c(category, number), names_to = "color") %>%

  # Convert to the proper datatypes.
  mutate(
    category = as.character(category),
    number = as.integer(number),
    value = as.numeric(value)
  ) %>%
  
  # Identify each "clump" of colors by the one row from which it originated;
  # where {'category', 'number'} uniquely identify each such row.
  group_by(category, number) %>%
  # Map the date names to each clump.
  mutate(
    # Index the entries in each clump.
    date = row_number(),
    # Map each date to its corresponding entry.
    date = dates[!is.na(dates)][date],
    # Ensure homogeneity as date objects.
    date = lubridate::as_datetime(date)
  ) %>% ungroup() %>%
  
  # Pivot the colors into consolidated columns: one for each color.
  pivot_wider(names_from = color, values_from = value) %>%
  
  # Sort as desired.
  arrange(date, category, number)

Results

Given a reprex.xlsx like the one you describe here

when I import my excel .xlsx file instead of a .csv file, the dates become numbers (e.g. 41092)

this solution should yield the following result for df:

# A tibble: 54 x 5
   category number date                 blue green
   <chr>     <int> <dttm>              <dbl> <dbl>
 1 G             1 2012-07-02 00:00:00     1     0
 2 G             2 2012-07-02 00:00:00     2    99
 3 G             3 2012-07-02 00:00:00     1     1
 4 G             4 2012-07-02 00:00:00     1     1
 5 G             5 2012-07-02 00:00:00     1     0
 6 G             6 2012-07-02 00:00:00     1    99
 7 G             7 2012-07-02 00:00:00     1     0
 8 G             8 2012-07-02 00:00:00     1     1
 9 G             9 2012-07-02 00:00:00     1     1
10 H             1 2012-07-02 00:00:00     1     1
11 H             2 2012-07-02 00:00:00     1    99
12 H             3 2012-07-02 00:00:00     1     1
13 H             4 2012-07-02 00:00:00     1    99
14 H             5 2012-07-02 00:00:00     1     1
15 H             6 2012-07-02 00:00:00     1     0
16 H             7 2012-07-02 00:00:00     1     1
17 H             8 2012-07-02 00:00:00     2     0
18 H             9 2012-07-02 00:00:00     2     0
19 G             1 2012-07-03 00:00:00     1     0
20 G             2 2012-07-03 00:00:00     2    99
21 G             3 2012-07-03 00:00:00     1    99
22 G             4 2012-07-03 00:00:00     1     1
23 G             5 2012-07-03 00:00:00     1     0
24 G             6 2012-07-03 00:00:00     1     1
25 G             7 2012-07-03 00:00:00     1     0
26 G             8 2012-07-03 00:00:00     1     1
27 G             9 2012-07-03 00:00:00     1     1
28 H             1 2012-07-03 00:00:00     1     1
29 H             2 2012-07-03 00:00:00     1     0
30 H             3 2012-07-03 00:00:00     1     1
31 H             4 2012-07-03 00:00:00     1     2
32 H             5 2012-07-03 00:00:00     1     1
33 H             6 2012-07-03 00:00:00     1     0
34 H             7 2012-07-03 00:00:00     2     1
35 H             8 2012-07-03 00:00:00     2     0
36 H             9 2012-07-03 00:00:00     2     0
37 G             1 2012-07-04 00:00:00     1     0
38 G             2 2012-07-04 00:00:00     1    99
39 G             3 2012-07-04 00:00:00     1    99
40 G             4 2012-07-04 00:00:00     2    99
41 G             5 2012-07-04 00:00:00     1    99
42 G             6 2012-07-04 00:00:00     1    99
43 G             7 2012-07-04 00:00:00     1     0
44 G             8 2012-07-04 00:00:00     1    99
45 G             9 2012-07-04 00:00:00     1     1
46 H             1 2012-07-04 00:00:00     1     1
47 H             2 2012-07-04 00:00:00     1     0
48 H             3 2012-07-04 00:00:00     1    99
49 H             4 2012-07-04 00:00:00     1    99
50 H             5 2012-07-04 00:00:00     1     1
51 H             6 2012-07-04 00:00:00     1    99
52 H             7 2012-07-04 00:00:00     1    99
53 H             8 2012-07-04 00:00:00     1     1
54 H             9 2012-07-04 00:00:00     1     1

Note

Much like openxlsx::convertToDate(), the readxl functions here automatically convert Excel date numbers into the proper R Dates.

Greg
  • 3,054
  • 6
  • 27
  • I think this solution may work, but my issue is that when I import my excel .xlsx file instead of a .csv file, the dates become numbers (e.g. 41092). Normally you could use as.Date() or some similar function to transform them, but because each date is in a different column, I can't find a way to do this without having to name list every column (which is not possible based on the size of the data) – cgxytf Jan 06 '22 at 14:54
  • @jl748795 I think I could tailor the solution to your needs, using `openxlsx::read.xlsx(detectDates = TRUE)` or `openxlsx::convertToDate()`. But I'll need an actual reprex of your Excel file to do so. – Greg Jan 10 '22 at 15:44
  • @jl748795 I updated my solution to handle Excel dates. – Greg Jan 10 '22 at 17:12
  • Thank you! To modify this for more colours, for example, 3, would I just change "col_names = FALSE, skip = 2, n_max = Inf)" to "col_names = FALSE, skip = 3, n_max = Inf)", or should I adjust somewhere else also? – cgxytf Jan 11 '22 at 21:34
  • @jl748795 You don't need to modify the code; it should *already* work for an arbitrary number of colors, as long as you preserve the existing format. The `skip = 2` is for skipping *rows*, and in this case, it helps us extract the unnamed dataset by skipping the dates (the `1`st row) and the header (the `2`nd row), which are each read separately. – Greg Jan 11 '22 at 22:16
  • Great, this worked perfectly! Thanks for taking the time to comment and explain everything! – cgxytf Jan 12 '22 at 18:29
  • @jl748795 Happy to help! – Greg Jan 12 '22 at 18:35
2

A base R option using resahpe

u <- type.convert(setNames(df[-1, ], df[1, ]), as.is = TRUE)
transform(
  reshape(
    cbind(
      u[1:2],
      setNames(
        u[-c(1:2)],
        paste0(
          names(u)[-c(1:2)],
          ".",
          ave(seq(length(u) - 2), names(u)[-c(1:2)], FUN = seq_along)
        )
      )
    ),
    direction = "long",
    idvar = c("category", "number"),
    varying = -c(1:2),
    timevar = "date"
  ),
  date = Filter(nchar, names(df))[date]
)

gives

      category number      date blue green
G.1.1        G      1 02.Jul.12    1     0
G.2.1        G      2 02.Jul.12    2    99
G.3.1        G      3 02.Jul.12    1     1
G.4.1        G      4 02.Jul.12    1     1
G.5.1        G      5 02.Jul.12    1     0
G.6.1        G      6 02.Jul.12    1    99
G.7.1        G      7 02.Jul.12    1     0
G.8.1        G      8 02.Jul.12    1     1
G.9.1        G      9 02.Jul.12    1     1
H.1.1        H      1 02.Jul.12    1     1
H.2.1        H      2 02.Jul.12    1    99
H.3.1        H      3 02.Jul.12    1     1
H.4.1        H      4 02.Jul.12    1    99
H.5.1        H      5 02.Jul.12    1     1
H.6.1        H      6 02.Jul.12    1     0
H.7.1        H      7 02.Jul.12    1     1
H.8.1        H      8 02.Jul.12    2     0
H.9.1        H      9 02.Jul.12    2     0
G.1.2        G      1 03.Jul.12    1     0
G.2.2        G      2 03.Jul.12    2    99
G.3.2        G      3 03.Jul.12    1    99
G.4.2        G      4 03.Jul.12    1     1
G.5.2        G      5 03.Jul.12    1     0
G.6.2        G      6 03.Jul.12    1     1
G.7.2        G      7 03.Jul.12    1     0
G.8.2        G      8 03.Jul.12    1     1
G.9.2        G      9 03.Jul.12    1     1
H.1.2        H      1 03.Jul.12    1     1
H.2.2        H      2 03.Jul.12    1     0
H.3.2        H      3 03.Jul.12    1     1
H.4.2        H      4 03.Jul.12    1     2
H.5.2        H      5 03.Jul.12    1     1
H.6.2        H      6 03.Jul.12    1     0
H.7.2        H      7 03.Jul.12    2     1
H.8.2        H      8 03.Jul.12    2     0
H.9.2        H      9 03.Jul.12    2     0
G.1.3        G      1 04.Jul.12    1     0
G.2.3        G      2 04.Jul.12    1    99
G.3.3        G      3 04.Jul.12    1    99
G.4.3        G      4 04.Jul.12    2    99
G.5.3        G      5 04.Jul.12    1    99
G.6.3        G      6 04.Jul.12    1    99
G.7.3        G      7 04.Jul.12    1     0
G.8.3        G      8 04.Jul.12    1    99
G.9.3        G      9 04.Jul.12    1     1
H.1.3        H      1 04.Jul.12    1     1
H.2.3        H      2 04.Jul.12    1     0
H.3.3        H      3 04.Jul.12    1    99
H.4.3        H      4 04.Jul.12    1    99
H.5.3        H      5 04.Jul.12    1     1
H.6.3        H      6 04.Jul.12    1    99
H.7.3        H      7 04.Jul.12    1    99
H.8.3        H      8 04.Jul.12    1     1
H.9.3        H      9 04.Jul.12    1     1
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
2

Here is another solution using dplyr and tidyr. We would first combine the headers and subheaders and then pivot the data frame. We will do two pivoting operations: first collect everything into date, name (consisting of either "blue" or "green") and value (consisting of the corresponding values for "blue" and "green"); then, pivot_wider the name and value columns. df comes directly from your excel reprex.

library(dplyr)
library(tidyr)

nms1 <- tidyr:::fillDown(na_if(names(df), ""))
nms2 <- unlist(df[1L, ])
df[-1L, ] %>% 
  setNames(if_else(is.na(nms1), nms2, paste(nms1, nms2, sep = "_"))) %>% 
  pivot_longer(-c(category, number), c("date", "name"), names_sep = "_") %>% 
  pivot_wider()

Output

# A tibble: 54 x 5
   category number date      blue  green
   <chr>    <chr>  <chr>     <chr> <chr>
 1 G        1      02.Jul.12 1     0    
 2 G        1      03.Jul.12 1     0    
 3 G        1      04.Jul.12 1     0    
 4 G        2      02.Jul.12 2     99   
 5 G        2      03.Jul.12 2     99   
 6 G        2      04.Jul.12 1     99   
 7 G        3      02.Jul.12 1     1    
 8 G        3      03.Jul.12 1     99   
 9 G        3      04.Jul.12 1     99   
10 G        4      02.Jul.12 1     1    
# ... with 44 more rows
ekoam
  • 8,744
  • 1
  • 9
  • 22