Using a function dictionary and mutate to assign groupings

Question

Background for context to my question:

I have data on metabolic, immune, and neuroendocrine analytes. I am creating a cumulative deficit index, which is a cumulative measure of analytes that are in dysregulation.

To create the index: All analytes are z-scored. For analytes where ‘too high’ means dysregulation, dysregulation is classified as the upper quartile. For analytes where ‘too low’ means dysregulation, dysregulation is classified as the lower quartile. For analytes where too high OR too low means dysregulation, dysregulation is classified as the upper or lower quintile. If an analyte is in dysregulation, it gets a ‘1’. The dysregulation index is the # of analytes in dysregulation, divided by the number of analytes available for that individual. ‘Available’ meaning, if an individual has NA for an analyte, that analyte is not counted towards the index.

The dysregulation index, called dysreg_index, includes ALL analytes (metabolic, immune, neuroendocrine). Further, I create sub-indices that are specific to the physiologic system, which comprise only metabolic analytes, immune analytes, or neuroendocrine analytes (named metab_index, imm_index, and neuro_index, respectively).

My actual question:

I have one analyte, globulin (named glob_zscore in my data), where too high of a value means it is in immune dysregulation, and too low of a value means it is in metabolic dysregulation. So, if glob_zscore is in the upper quartile, I want to add it to the immu_index, and if it is in the lower quartile, add it to the metab_index. If either of those conditions are true, I want it to be added to the dysreg_index score.

I have not been successful in partitioning this in my code. I have pasted my code below as it is currently. Any help in this manner would be greatly appreciated. Thank you for your time!

funcs <- list(
  alt_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
  alb_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
  alp_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
  calcium_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
  uc_ratio_zscore = function(z) !is.na(z) & z < quantile(z, 0.25, na.rm = TRUE),
  sodium_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
  phos_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
  pot_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
  pcv_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
  baso_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
  esino_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
  lympho_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
  mono_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
  neutro_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
  cortisol_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE))
)

mapply(function(fn, x) fn(x), funcs, data[names(funcs)])

data <- data %>%
  mutate(
    metab_index = {            ## METABOLIC INDEX
    numerator <- mapply(function(fn, x) fn(x), funcs[1:9], pick(all_of(names(funcs[1:9])))) # order matters
      denominator <- (!is.na(pick(all_of(names(funcs[1:9]))))) # denominator is the number of non-NA elements in a row
      rowSums(numerator) / rowSums(denominator)
    }
  ) %>% 
  mutate(
    imm_index = {             ## IMMUNE INDEX
    numerator <- mapply(function(fn, x) fn(x), funcs[10:14], pick(all_of(names(funcs[10:14]))))
    denominator <- (!is.na(pick(all_of(names(funcs[10:14])))))
      rowSums(numerator) / rowSums(denominator)
    }
  ) %>% 
  mutate(
    neuro_index = {           ## NEUROENDOCRINE INDEX
      numerator <- mapply(function(fn, x) fn(x), funcs[15], pick(all_of(names(funcs[15]))))
      denominator <- (!is.na(pick(all_of(names(funcs[15])))))
      rowSums(numerator) / rowSums(denominator)
    }
  ) %>% 
  mutate(
    dysreg_index = {          ## TOTAL DYSREGULATION INDEX (includes metabolic, immune, and neuroendocrine)
      numerator <- mapply(function(fn, x) fn(x), funcs, pick(all_of(names(funcs))))
      denominator <- (!is.na(pick(all_of(names(funcs)))))
      rowSums(numerator) / rowSums(denominator)
    }
  )

The analytes that currently go into metab_index are: alt_zscore, alb_zscore, alp_zscore, calcium_zscore, uc_ratio_zscore, sodium_zscore, phos_zscore, pot_zscore, pcv_zscore. If glob_zscore is in the lower quartile, I want it to also go into the metab_index.

The analytes that currently go into imm_index are: baso_per_ul_zscore, esino_per_ul_zscore, lympho_per_ul_zscore, mono_per_ul_zscore, neutro_per_ul_zscore. If glob_zscore is in the upper quartile, I want it to also go into the imm_index.

The only analyte that currently goes into the neuro_index is cortisol_zscore.

If glob_zscore is either in the upper quartile or lower quartile, I want it to go into the dysreg_index.

Here is a dput and head of my data currently. Please note that these data have values for metab_index, imm_index, neuro_index, and dysreg_index - these indices were computed using all of the above analytes except for glob_zscore, which I have not yet incorporated into the function dictionary.

structure(list(alt_zscore = c(1.15628571428571, 0.899333333333333, 
0, -0.730708333333333, 0.0963571428571428, -1.06795833333333), 
    alb_zscore = c(1.888599484682, 0.134900515317999, 0, 0.6745, 
    -0.809400515317999, 0.6745), alp_zscore = c(2.99309375, -1.39021528321409, 
    0, -0.64982016264779, -0.274015625, -0.304302439716851), 
    calcium_zscore = c(1.09606450959036, 0.449665953945447, 0, 
    -0.674500004496664, -0.33725, -0.674500004496664), uc_ratio_zscore = c(0.691189771122184, 
    0.00395552487310546, 0, -0.955924044178282, -0.545585328858177, 
    -0.54077986726889), sodium_zscore = c(0.932489252756058, 
    -0.6745, 0, -1.180375, 0.310829750918686, -1.01175), phos_zscore = c(-0.103769544771059, 
    1.21409991456333, 0, 1.39396640945733, -1.93270290026917, 
    -1.30403359054267), pot_zscore = c(1.07919974530892, 0.134899228372, 
    0, -0.404700259007998, 1.21409890946892, -0.269801030635998
    ), pcv_zscore = c(0, 0, -0.243018626530217, 1.2141, -0.959399470734058, 
    -0.20235), glob_zscore = c(-1.079198972062, 0.385428307960541, 
    0, -0.963571690112316, -1.21409948738, -0.963571690112316
    ), baso_per_ul_zscore = c(0, 0, 0, 0, 0, -1.2646875), esino_per_ul_zscore = c(0, 
    0, 2.0877380952381, -0.108792935519412, -0.21912328042328, 
    1.68625), lympho_per_ul_zscore = c(0, 0, -0.173182432432432, 
    -1.11988412698413, -0.525016216216216, 0.565295238095238), 
    mono_per_ul_zscore = c(0, 0, 1.01941477272727, -0.332159846897352, 
    3.01532159090909, 0.844644122370989), neutro_per_ul_zscore = c(0, 
    0, -0.82978274474743, -0.560261015649008, 1.3692018328118, 
    -0.69385232985532), cortisol_zscore = c(9.53660294508432, 
    8.6351129251796, 0, 0, 0, 0), metab_index = c(0.625, 0.25, 
    0, 0.555555555555556, 0.333333333333333, 0.333333333333333
    ), imm_index = c(0, 0, 0.25, 0, 0.5, 0.2), neuro_index = c(1, 
    1, 0, 0, 0, 0), dysreg_index = c(0.666666666666667, 0.333333333333333, 
    0.2, 0.384615384615385, 0.384615384615385, 0.285714285714286
    )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

EDIT:

I figured out how to get the metab_index and imm_index to do what I want, by adding into the function dictionary:

glob_zscore = function(z) !is.na(z) & z < quantile(z, 0.25, na.rm = TRUE),  # lower quartile (25%) is dysregulation for METABOLIC
  glob_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE) # upper quartile (75%) is dysregulation for IMMUNE

And adjusting the numbers within the pick() functions in mutate() to include which function goes into metabolic vs. immune.

Now I just have to figure out how to adjust the dysreg_index, which is not doing what I want. This output should be a number between 0-1 but I am getting numbers >1 at times. It's something in this argument...

mutate(
    dysreg_index = {          ## TOTAL DYSREGULATION INDEX (includes metabolic, immune, and neuroendocrine)
      numerator <- mapply(function(fn, x) fn(x), funcs, pick(all_of(names(funcs))))
      denominator <- (!is.na(pick(all_of(names(funcs)))))
      rowSums(numerator) / rowSums(denominator)
    }
  )

The aim of a function is to not repeat yourself. You have multiple functions that are exactly the same with different names. Why would you do that? — Onyambu, Aug 17 '23 at 23:55
can you provide a small dataframe to show what your data looks like, and the expected output? It would only have to be a few rows. Thanks! — Mark, Aug 18 '23 at 01:33
@Onyambu You are correct, however for my purposes it is more appropriate to explicitly show each function per analyte because there is continual discussion around the classification of dysregulation for each analyte. It is quite dynamic.. — burphound, Aug 18 '23 at 04:25
Thats the objective of a function. To prevent repetition. You do not need the same function repeated multiple times. You just need to write one function and run it several times — Onyambu, Aug 18 '23 at 04:28
@Onyambu Again you are correct, however for user purposes at this point it is better this way. — burphound, Aug 18 '23 at 04:51
@Mark the expected output is akin to what is in the “_index” columns in the dput - a number between 0-1, but also incorporating the glob_zscore based on its conditional rules — burphound, Aug 18 '23 at 05:39
thanks so much for your responses so far! looking at the code right now, the neuro index is turning every value into 1 (because it's dividing the cortisol_zscore by itself. Is that intended behaviour? — Mark, Aug 18 '23 at 05:43
@Mark thank -you- for your help! Indeed it is intended behavior. Ideally there would be more analytes to go into the neuro index but cortisol is all that is available. Edit: I mean it is intended that the neuro index should be either 1 or 0. But if it’s giving only 1s when there should be 0s too (see dput), then that is not intended… — burphound, Aug 18 '23 at 05:44

Using a function dictionary and mutate to assign groupings

0 Answers0