I want to manipulate the columns of two data tables to create a Sankey diagram.
I have one data table for the year 2018
and one for 2019
(both are having the same column names/order):
dt.2018 <- structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L),
id = c(100236L, 101554L, 111328L, 124213L, 127434L, 128509L, 130058L, 130192L, 130224L, 130309L),
EV = c(-33498.61, 0, -31117.17, 70950.43, -37984.99, 0, 0, -93820830.01, -7852622.13, -245712.8),
group = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("0", "1", "2", "3"), class = "factor")),
class = c("data.table", "data.frame"), row.names = c(NA, -10L))
dt.2019 <- structure(list(Year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L),
id = c(100236L, 101554L, 111328L, 124213L, 127434L, 128003L, 128509L, 130058L, 130192L, 130351L),
EV = c(-130944.64, 21951918.61, 15335.68, 101518.72, 36552.26, 0, 78507.63, 0, 0, 276820967.02),
group = structure(c(1L, 4L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1", "2", "3"), class = "factor")),
class = c("data.table", "data.frame"), row.names = c(NA, -10L))
First of all I want to compare the id
columns.
id
s ofdt.2019
, which are not indt.2018
can be deleted (whole row).- For
id
s ofdt.2018
, which are not indt.2019
we have to copy/rbind the whole row intodt.2019
and replace then the number ofgroup
by8
indt.2019
. - In the end I want to sort the
id
columns of both (sameid
has to be at same row-index). - Replace
group
integers indt.2019
: replacec(0,1,2,3)
byc(4,5,6,7)
and8
should stay the same.
How can I do this in a fast and efficient way?