Calculate new values in dataframe from values from a separate dataframe with same field prefix (dplyover/dyplyr)

Question

Hi I had asked a question earlier about calculating new sums for a column based on field name patterns in R and Tim TeaFan had gave me a solution with his dylpyover package.

Now I need to do some more calculations using a similar method but have an added complication that some of those calculations need to grab a cell value from a different dataframe but has a matching prefix to the cut_names from the dataframe being altered.

Here is the basic equation I need done, but the "{.x}_LB_SUM" and "{.x}_UB_SUM" don't live exist in the ClusterFinal dataframe but within AOIsummary, a single row dataframe with a different field structure than the ClusterFinal dataframe. I've included the first couple rows & columns of both dataframes below.

FinalALL <- ClusterFinal %>%
  mutate(
    over(cut_names("_EST$"),
         list(LB_LQ = ~ round((.("{.x}_LB")/(.("{.x}_LB_SUM"))))/(ClusterFinal$TOTPOPRACE_LB/SUM_TOTPOP_LB), digits = 2),
              UB_LQ = ~ round((.("{.x}_UB")/(.("{.x}_UB_SUM"))))/(ClusterFinal$TOTPOPRACE_UB/SUM_TOTPOP_UB), digits = 2),

    )
  )

> dput(head(st_drop_geometry(AOIsummary[1:20])))
structure(list(TOTPOPRACE_LB_SUM = 151945, TOTPOPRACE_EST_SUM = 155886, 
    TOTPOPRACE_UB_SUM = 159827, TOTPOPRACE_LB_PCT_SUM = 100, 
    TOTPOPRACE_UB_PCT_SUM = 100, HSPBIPOC_LB_SUM = 25871, HSPBIPOC_EST_SUM = 28037, 
    HSPBIPOC_UB_SUM = 30203, HSPBIPOC_LB_PCT_SUM = 17.0265556615881, 
    HSPBIPOC_UB_PCT_SUM = 18.8973077139657, NHSPWHTALN_LB_SUM = 124238, 
    NHSPWHTALN_EST_SUM = 127849, NHSPWHTALN_UB_SUM = 131460, 
    NHSPWHTALN_LB_PCT_SUM = 81.7651123761888, NHSPWHTALN_UB_PCT_SUM = 82.2514343634054, 
    NHSPBLKALN_LB_SUM = 805, NHSPBLKALN_EST_SUM = 1113, NHSPBLKALN_UB_SUM = 1421, 
    NHSPBLKALN_LB_PCT_SUM = 0.529796966007437, NHSPBLKALN_UB_PCT_SUM = 0.889086324588461), row.names = 1L, class = "data.frame")

> dput(head(st_drop_geometry(ClusterFinal[5:32])))
structure(list(TOTPOPRACE_EST = c(1134, 13396, 35275, 20924, 
14351), TOTPOPRACE_CV = c(0.283044659944356, 0.0549543862201148, 
0.0307605467404691, 0.041019248793624, 0.0462602994322089), TOTPOPRACE_SE = c(320.9726443769, 
736.168957804657, 1085.07828627005, 858.286761757789, 663.88155715163
), TOTPOPRACE_SESQD = c(103023.4384383, 541944.734435195, 1177394.88733474, 
736656.165408671, 440738.721926072), TOTPOPRACE_MOE = c(528, 
1211, 1785, 1412, 1092), TOTPOPRACE_LB = c(606, 12185, 33490, 
19512, 13259), TOTPOPRACE_UB = c(1662, 14607, 37060, 22336, 15443
), HSPBIPOC_EST = c(468, 3744, 10016, 5907, 4109), HSPBIPOC_CV = c(0.385961515530025, 
0.118166392157128, 0.0833061826764167, 0.109830626279308, 0.127695904080848
), HSPBIPOC_SE = c(180.629989268052, 442.414972236288, 834.394725686989, 
648.769509431874, 524.702469868202), HSPBIPOC_SESQD = c(32627.1930229765, 
195731.007658835, 696214.558254266, 420901.876368474, 275312.681885792
), HSPBIPOC_MOE = c(297, 728, 1373, 1067, 863), HSPBIPOC_LB = c(171, 
3016, 8643, 4840, 3246), HSPBIPOC_UB = c(765, 4472, 11389, 6974, 
4972), NHSPWHTALN_EST = c(666, 9652, 25259, 15017, 10242), NHSPWHTALN_CV = c(0.411657858466369, 
0.0689992419410154, 0.0363261839688132, 0.0491849956570817, 0.0531540683299724
), NHSPWHTALN_SE = c(274.164133738602, 665.98068321468, 917.563080868252, 
738.611079782396, 544.403967835577), NHSPWHTALN_SESQD = c(75165.972228638, 
443530.270415092, 841922.007372437, 545546.327177317, 296375.68019512
), NHSPWHTALN_MOE = c(451, 1096, 1509, 1215, 896), NHSPWHTALN_LB = c(215, 
8556, 23750, 13802, 9346), NHSPWHTALN_UB = c(1117, 10748, 26768, 
16232, 11138), NHSPBLKALN_EST = c(170, 180, 200, 118, 82), NHSPBLKALN_CV = c(0.557840157339532, 
0.438442946192193, 0.334277417565133, 0.402329064204202, 0.574051766108857
), NHSPBLKALN_SE = c(94.8328267477204, 78.9197303145948, 66.8554835130266, 
47.4748295760959, 47.0722448209263), NHSPBLKALN_SESQD = c(8993.26502896315, 
6228.32383292837, 4469.65567576057, 2253.85944327935, 2215.79623248122
), NHSPBLKALN_MOE = c(156, 130, 110, 78, 77), NHSPBLKALN_LB = c(14, 
50, 90, 40, 5), NHSPBLKALN_UB = c(326, 310, 310, 196, 159)), row.names = c(NA, 
5L), class = "data.frame")

So far I've tried piping to both dataframes and then doing the cut_names and I've also tried creating a new dataframe with both datasets in it as a tibble and then passing that into my FinalAll code (see below). That didn't work. I'm wondering now about finding a way to grab the cut_name and use that with grep somehow to grab the specific column I need for the calculation. Any help would be much appreciated. Thanks!

CombDF <- tibble(d = paste0("d", 1:2), data = list(as_tibble(ClusterFinal),as_tibble(AOIsummary)))

So the way I ended up working around this issue was to add the columns I needed from AOIsummary to the ClusterFinal dataframe in a new dataframe. I applied the mutate/over code and then removed the columns I added from the new data frame. It may not be elegant, but it solved my problem. — Keren Crum, Dec 08 '22 at 21:58

Calculate new values in dataframe from values from a separate dataframe with same field prefix (dplyover/dyplyr)

0 Answers0