1

I have a problem with a piece of code that was suggested in the post: How to create a data set that, for each hour, matches the individual elements entering the dataset until a condition has been reached in R

I have two different data tables that are not identical in size. Tab_consolidated has a nrow of 3612, while Tab_missing_load is much larger and has over 240k nrows. The form of both data table:

> head(Tab_consolidated)
   Year Units         Power Marginal_cost Cumulated_Power
  <dbl> <chr>         <dbl>         <dbl>           <dbl>
1  2023 El_WK_Mol_B1    200           150             200
2  2023 El_WK_Mol_B2    150           200             350
3  2023 El_WK_Rab_B1    300           220             650
4  2024 El_WK_Mol_B2    150           200             200
5  2024 El_WK_Bag_B6    200           250             400
6  2024 El_WK_Bag_B7    330           270             730

Below is a table that highlights the Missing_load needed to be filled in by the Units from Tab_consolidated in each hour of the year.

> head(Tab_missing_load)
    Year Hour_day Hour_year Missing_load
   <dbl>    <dbl>     <dbl>        <dbl>
1:  2023        1         1      442.017
2:  2023        2         2      401.548
3:  2023        3         3      375.848
4:  2023        4         4      319.509
5:  2023        5         5      318.953
6:  2023        6         6      141.473

The proposed solution is as follows:

library(data.table)

f <- function(ml,y, cons) {
        cons[Year==y, .SD[1:min(which(Cumulated_Power>ml)), .(Year,Units)]]
}

setDT(Tab_missing_load)
setDT(Tab_consolidated)

dcast(
  Tab_missing_load[, f(Missing_load,Year,Tab_consolidated), .I][,id:=paste0(rowid(I), "_Unit")],
  Year+I~id, value.var="Units",
)

Output:

    Year     I       1_Unit       2_Unit       3_Unit
   <int> <int>       <char>       <char>       <char>
1:  2023     1 El_WK_Mol_B1 El_WK_Mol_B2 El_WK_Rab_B1
2:  2023     2 El_WK_Mol_B1 El_WK_Mol_B2 El_WK_Rab_B1
3:  2023     3 El_WK_Mol_B1 El_WK_Mol_B2 El_WK_Rab_B1
4:  2023     4 El_WK_Mol_B1 El_WK_Mol_B2         <NA>
5:  2023     5 El_WK_Mol_B1 El_WK_Mol_B2         <NA>
6:  2023     6 El_WK_Mol_B1         <NA>         <NA>

Unfortunately, when applying the code I encounter a warning message:

Error in .prepareFastSubset(isub = isub, x = x, enclos = parent.frame(), : RHS of == is length 245280 which is not 1 or nrow (3612). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead.

I am assuming that this is an error related to the fact that I am not dealing with identical data tables here. Unfortunately, I have tried various ways, but to no avail. I will post my attempts and the warning messages in the comments section.

Many thanks for any suggestions and solutions.

zx8754
  • 52,746
  • 12
  • 114
  • 209
bLanton70
  • 17
  • 4
  • ```f <- function(ml, y, cons){ cons[Year == y, .SD[1:min(which(Cumulated_Power>ml)), .(Year, Units)]] } dcast( Tab_missing_load[, f(Missing_load, Year, Tab_consolidated), .(Year, Hour_year)], Year + Hour_year ~ make.unique(paste0("Unit_", rowid(Year, Hour_year))), value.var = "Units") Error: result would be too long a vector In addition: Warning message: In min(which(Cumulated_power > ml)) :``` – bLanton70 Mar 21 '23 at 11:23
  • I tried to write the code differently ie above in the comment because a slight modification threw an error: ```Error in check_formula(formula, names(data), valnames) : data.table to cast must have unique column names``` – bLanton70 Mar 21 '23 at 11:27

0 Answers0