1

I have a dataframe that looks like this:

df <- data.frame(ID = c(1,2,3,4,5,6), Type = c("A","A","B","B","C","C"), `2019` = c(1,2,3,4,5,6),`2020` = c(2,3,4,5,6,7), `2021` = c(3,4,5,6,7,8))

  ID Type X2019 X2020 X2021
1  1    A     1     2     3
2  2    A     2     3     4
3  3    B     3     4     5
4  4    B     4     5     6
5  5    C     5     6     7
6  6    C     6     7     8

Now, I'm looking for some code that does the following: 1. Create a new data.frame for every row in df 2. Names the new dataframe with a combination of "ID" and "Type" (A_1, A_2, ... , C_6)

The resulting new dataframes should look like this (example for A_1, A_2 and C_6):

  Year Values
1 2019      1
2 2020      2
3 2021      3

  Year Values
1 2019      2
2 2020      3
3 2021      4

  Year Values
1 2019      6
2 2020      7
3 2021      8

I have some things that somehow complicate the code: 1. The code should work in the next few years without any changes, meaning next year the data.frame df will no longer contain the years 2019-2021, but rather 2020-2022. 2. As the data.frame df is only a minimal reproducible example, I need some kind of loop. In the "real" data, I have a lot more rows and therefore a lot more dataframes to be created.

Unfortunately, I can't give you any code, as I have absolutely no idea how I could manage that. While researching, I found the following code that may help adress the first problem with the changing years:

year <- as.numeric(format(Sys.Date(), "%Y"))

Further, I read about list, and that it may help to work with a list in a for loop and then transform the list back into a dataframe. Sorry for my limited approach, I hope anyone can give me a hint or even the solution to my problem. If you need any further information, please let me know. Thanks in advance!

A kind of similar question to mine: Populating a data frame in R in a loop

2 Answers2

1

Try this:

library(stringr)
library(dplyr)
library(tidyr)
library(magrittr)

df %>%
  gather(Year, Values, 3:5) %>%
  mutate(Year = str_sub(Year, 2)) %>%
  select(ID, Year, Values) %>%
  group_split(ID) # split(.$ID) 

# [[1]]
# # A tibble: 3 x 3
#     ID Year  Values
#   <dbl> <chr>  <dbl>
# 1     1 2019       1
# 2     1 2020       2
# 3     1 2021       3
# 
# [[2]]
# # A tibble: 3 x 3
#     ID Year  Values
#   <dbl> <chr>  <dbl>
# 1     2 2019       2
# 2     2 2020       3
# 3     2 2021       4
# 
# [[3]]
# # A tibble: 3 x 3
#     ID Year  Values
#   <dbl> <chr>  <dbl>
# 1     3 2019       3
# 2     3 2020       4
# 3     3 2021       5
# 
# [[4]]
# # A tibble: 3 x 3
#     ID Year  Values
#   <dbl> <chr>  <dbl>
# 1     4 2019       4
# 2     4 2020       5
# 3     4 2021       6
# 
# [[5]]
# # A tibble: 3 x 3
#     ID Year  Values
#   <dbl> <chr>  <dbl>
# 1     5 2019       5
# 2     5 2020       6
# 3     5 2021       7
# 
# [[6]]
# # A tibble: 3 x 3
#     ID Year  Values
# <dbl> <chr>  <dbl>
# 1     6 2019       6
# 2     6 2020       7
# 3     6 2021       8


Data

df <- data.frame(ID = c(1,2,3,4,5,6), Type = c("A","A","B","B","C","C"), `2019` = c(1,2,3,4,5,6),`2020` = c(2,3,4,5,6,7), `2021` = c(3,4,5,6,7,8))
deepseefan
  • 3,701
  • 3
  • 18
  • 31
  • select is from dplyr. If you add ```library(dplyr)```, this works. – markhogue Oct 15 '19 at 14:11
  • select is from dplyr, but dplyr is from tidyverse as well... therefore the suggestion from @deepseefan should work just fine. Unfortunately, I receive the following error: `Error in group_split(., ID) : could not find function "group_split"` I double checked the 2 packages and if they are activated. – TheEconomist Oct 15 '19 at 14:24
  • 1
    @TheEconomist, if `group_split` is troubling you, replace it with `split(.$ID) ` and it should work. – deepseefan Oct 15 '19 at 14:35
  • 1
    After updating my R aswell as the dplyr package I can confirm, both of your 2 variants work perfectly fine. Thank you for your time. – TheEconomist Oct 15 '19 at 14:45
1
library(magrittr)
library(tidyr)
library(dplyr)
library(stringr)

names(df) <- str_replace_all(names(df), "X", "") #remove X's from year names

df %>%
  gather(Year, Values, 3:5) %>%
  select(ID, Year, Values) %>%
  group_split(ID)
markhogue
  • 1,056
  • 1
  • 6
  • 16
  • Thank you very much for your answer. You were relly observant about the X in the year names. Unfortunately, i receive the following error: `Error in group_split(., ID) : could not find function "group_split"`. I already double checked the 2 packages and if they are really activated.. – TheEconomist Oct 15 '19 at 14:28
  • 1
    If it's not in your version of ```dplyr```, you might not have the latest. You might need to update your packages and even R, if you don't have 3.6.1. – markhogue Oct 15 '19 at 14:30