1

I want to manipulate the columns of two data tables to create a Sankey diagram. I have one data table for the year 2018 and one for 2019 (both are having the same column names/order):

dt.2018 <- structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L), 
                          id = c(100236L, 101554L, 111328L, 124213L, 127434L, 128509L, 130058L, 130192L, 130224L, 130309L),
                          EV = c(-33498.61, 0, -31117.17, 70950.43, -37984.99, 0, 0, -93820830.01, -7852622.13, -245712.8), 
                          group = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("0", "1", "2", "3"), class = "factor")), 
                         class = c("data.table", "data.frame"), row.names = c(NA, -10L))

dt.2019 <- structure(list(Year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L), 
                          id = c(100236L, 101554L, 111328L, 124213L, 127434L, 128003L, 128509L, 130058L, 130192L, 130351L), 
                          EV = c(-130944.64, 21951918.61, 15335.68, 101518.72, 36552.26, 0, 78507.63, 0, 0, 276820967.02), 
                          group = structure(c(1L, 4L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1", "2", "3"), class = "factor")), 
                         class = c("data.table", "data.frame"), row.names = c(NA, -10L))

First of all I want to compare the id columns.

  1. ids of dt.2019, which are not in dt.2018 can be deleted (whole row).
  2. For ids of dt.2018, which are not in dt.2019 we have to copy/rbind the whole row into dt.2019 and replace then the number of group by 8 in dt.2019.
  3. In the end I want to sort the id columns of both (same id has to be at same row-index).
  4. Replace group integers in dt.2019: replace c(0,1,2,3) by c(4,5,6,7) and 8 should stay the same.

How can I do this in a fast and efficient way?

ismirsehregal
  • 30,045
  • 5
  • 31
  • 78
MikiK
  • 398
  • 6
  • 19
  • For future readers: [here](https://stackoverflow.com/questions/68906156/how-to-create-a-sankey-diagram-when-certain-values-are-omitted) you can find a related / previous question. – ismirsehregal Aug 25 '21 at 09:11

2 Answers2

3

Conditions were pretty complicated for me to understand, so please tell me if this code does not match with your purpose.

    dt.2019 %>%
      filter(id %in% dt.2018$id) %>%
      mutate(group = recode_factor(group, "0" = "4", "1" = "5", "2" ="6", "3" ="7")) %>%
      rbind(dt.2018 %>%
              filter(! id %in% dt.2019$id) %>%
              mutate(group = 8)) %>%
      arrange(id)
Park
  • 14,771
  • 6
  • 10
  • 29
  • 1
    This is precisely how I was doing it too. Just one difference, instead of ```rbind``` I was using ```bind_rows``` – Shibaprasadb Aug 25 '21 at 07:54
2

We could do this with a combination of semi_join and anti_join and bind_rows and reocde_factor all in dplyr

library(dplyr)

dt.2019 %>% 
  semi_join(dt.2018, by="id") %>% 
  mutate(group = recode_factor(group, "0" = "4", "1" = "5", "2" ="6", "3" ="7")) %>% 
  bind_rows(dt.2018 %>% 
              anti_join(dt.2019, by="id") %>% 
              mutate(group=8, group=as.factor(group))
            )

output:

    Year     id          EV group
 1: 2019 100236  -130944.64     4
 2: 2019 101554 21951918.61     7
 3: 2019 111328    15335.68     4
 4: 2019 124213   101518.72     5
 5: 2019 127434    36552.26     5
 6: 2019 128509    78507.63     5
 7: 2019 130058        0.00     5
 8: 2019 130192        0.00     5
 9: 2018 130224 -7852622.13     8
10: 2018 130309  -245712.80     8
TarJae
  • 72,363
  • 6
  • 19
  • 66