Background for context to my question:
I have data on metabolic, immune, and neuroendocrine analytes. I am creating a cumulative deficit index, which is a cumulative measure of analytes that are in dysregulation.
To create the index: All analytes are z-scored. For analytes where ‘too high’ means dysregulation, dysregulation is classified as the upper quartile. For analytes where ‘too low’ means dysregulation, dysregulation is classified as the lower quartile. For analytes where too high OR too low means dysregulation, dysregulation is classified as the upper or lower quintile. If an analyte is in dysregulation, it gets a ‘1’. The dysregulation index is the # of analytes in dysregulation, divided by the number of analytes available for that individual. ‘Available’ meaning, if an individual has NA for an analyte, that analyte is not counted towards the index.
The dysregulation index, called dysreg_index, includes ALL analytes (metabolic, immune, neuroendocrine). Further, I create sub-indices that are specific to the physiologic system, which comprise only metabolic analytes, immune analytes, or neuroendocrine analytes (named metab_index, imm_index, and neuro_index, respectively).
My actual question:
I have one analyte, globulin (named glob_zscore in my data), where too high of a value means it is in immune dysregulation, and too low of a value means it is in metabolic dysregulation. So, if glob_zscore is in the upper quartile, I want to add it to the immu_index, and if it is in the lower quartile, add it to the metab_index. If either of those conditions are true, I want it to be added to the dysreg_index score.
I have not been successful in partitioning this in my code. I have pasted my code below as it is currently. Any help in this manner would be greatly appreciated. Thank you for your time!
funcs <- list(
alt_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
alb_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
alp_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
calcium_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
uc_ratio_zscore = function(z) !is.na(z) & z < quantile(z, 0.25, na.rm = TRUE),
sodium_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
phos_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
pot_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)),
pcv_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
baso_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
esino_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
lympho_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
mono_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
neutro_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE),
cortisol_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE))
)
mapply(function(fn, x) fn(x), funcs, data[names(funcs)])
data <- data %>%
mutate(
metab_index = { ## METABOLIC INDEX
numerator <- mapply(function(fn, x) fn(x), funcs[1:9], pick(all_of(names(funcs[1:9])))) # order matters
denominator <- (!is.na(pick(all_of(names(funcs[1:9]))))) # denominator is the number of non-NA elements in a row
rowSums(numerator) / rowSums(denominator)
}
) %>%
mutate(
imm_index = { ## IMMUNE INDEX
numerator <- mapply(function(fn, x) fn(x), funcs[10:14], pick(all_of(names(funcs[10:14]))))
denominator <- (!is.na(pick(all_of(names(funcs[10:14])))))
rowSums(numerator) / rowSums(denominator)
}
) %>%
mutate(
neuro_index = { ## NEUROENDOCRINE INDEX
numerator <- mapply(function(fn, x) fn(x), funcs[15], pick(all_of(names(funcs[15]))))
denominator <- (!is.na(pick(all_of(names(funcs[15])))))
rowSums(numerator) / rowSums(denominator)
}
) %>%
mutate(
dysreg_index = { ## TOTAL DYSREGULATION INDEX (includes metabolic, immune, and neuroendocrine)
numerator <- mapply(function(fn, x) fn(x), funcs, pick(all_of(names(funcs))))
denominator <- (!is.na(pick(all_of(names(funcs)))))
rowSums(numerator) / rowSums(denominator)
}
)
The analytes that currently go into metab_index are: alt_zscore, alb_zscore, alp_zscore, calcium_zscore, uc_ratio_zscore, sodium_zscore, phos_zscore, pot_zscore, pcv_zscore. If glob_zscore is in the lower quartile, I want it to also go into the metab_index.
The analytes that currently go into imm_index are: baso_per_ul_zscore, esino_per_ul_zscore, lympho_per_ul_zscore, mono_per_ul_zscore, neutro_per_ul_zscore. If glob_zscore is in the upper quartile, I want it to also go into the imm_index.
The only analyte that currently goes into the neuro_index is cortisol_zscore.
If glob_zscore is either in the upper quartile or lower quartile, I want it to go into the dysreg_index.
Here is a dput and head of my data currently. Please note that these data have values for metab_index, imm_index, neuro_index, and dysreg_index - these indices were computed using all of the above analytes except for glob_zscore, which I have not yet incorporated into the function dictionary.
structure(list(alt_zscore = c(1.15628571428571, 0.899333333333333,
0, -0.730708333333333, 0.0963571428571428, -1.06795833333333),
alb_zscore = c(1.888599484682, 0.134900515317999, 0, 0.6745,
-0.809400515317999, 0.6745), alp_zscore = c(2.99309375, -1.39021528321409,
0, -0.64982016264779, -0.274015625, -0.304302439716851),
calcium_zscore = c(1.09606450959036, 0.449665953945447, 0,
-0.674500004496664, -0.33725, -0.674500004496664), uc_ratio_zscore = c(0.691189771122184,
0.00395552487310546, 0, -0.955924044178282, -0.545585328858177,
-0.54077986726889), sodium_zscore = c(0.932489252756058,
-0.6745, 0, -1.180375, 0.310829750918686, -1.01175), phos_zscore = c(-0.103769544771059,
1.21409991456333, 0, 1.39396640945733, -1.93270290026917,
-1.30403359054267), pot_zscore = c(1.07919974530892, 0.134899228372,
0, -0.404700259007998, 1.21409890946892, -0.269801030635998
), pcv_zscore = c(0, 0, -0.243018626530217, 1.2141, -0.959399470734058,
-0.20235), glob_zscore = c(-1.079198972062, 0.385428307960541,
0, -0.963571690112316, -1.21409948738, -0.963571690112316
), baso_per_ul_zscore = c(0, 0, 0, 0, 0, -1.2646875), esino_per_ul_zscore = c(0,
0, 2.0877380952381, -0.108792935519412, -0.21912328042328,
1.68625), lympho_per_ul_zscore = c(0, 0, -0.173182432432432,
-1.11988412698413, -0.525016216216216, 0.565295238095238),
mono_per_ul_zscore = c(0, 0, 1.01941477272727, -0.332159846897352,
3.01532159090909, 0.844644122370989), neutro_per_ul_zscore = c(0,
0, -0.82978274474743, -0.560261015649008, 1.3692018328118,
-0.69385232985532), cortisol_zscore = c(9.53660294508432,
8.6351129251796, 0, 0, 0, 0), metab_index = c(0.625, 0.25,
0, 0.555555555555556, 0.333333333333333, 0.333333333333333
), imm_index = c(0, 0, 0.25, 0, 0.5, 0.2), neuro_index = c(1,
1, 0, 0, 0, 0), dysreg_index = c(0.666666666666667, 0.333333333333333,
0.2, 0.384615384615385, 0.384615384615385, 0.285714285714286
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
EDIT:
I figured out how to get the metab_index and imm_index to do what I want, by adding into the function dictionary:
glob_zscore = function(z) !is.na(z) & z < quantile(z, 0.25, na.rm = TRUE), # lower quartile (25%) is dysregulation for METABOLIC
glob_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE) # upper quartile (75%) is dysregulation for IMMUNE
And adjusting the numbers within the pick() functions in mutate() to include which function goes into metabolic vs. immune.
Now I just have to figure out how to adjust the dysreg_index, which is not doing what I want. This output should be a number between 0-1 but I am getting numbers >1 at times. It's something in this argument...
mutate(
dysreg_index = { ## TOTAL DYSREGULATION INDEX (includes metabolic, immune, and neuroendocrine)
numerator <- mapply(function(fn, x) fn(x), funcs, pick(all_of(names(funcs))))
denominator <- (!is.na(pick(all_of(names(funcs)))))
rowSums(numerator) / rowSums(denominator)
}
)