I apologize in advance for asking this painstaking question.
I have a dataset that looks like this (there are about 1000 rows total, this is just the head). Each column is a z-scored blood analyte.
structure(list(alt_zscore = c(1.15628571428571, 0.899333333333333,
NA, -0.730708333333333, 0.0963571428571428, -1.06795833333333
), alb_zscore = c(1.888599484682, 0.134900515317999, NA, 0.6745,
-0.809400515317999, 0.6745), alp_zscore = c(2.99309375, -1.39021528321409,
NA, -0.64982016264779, -0.274015625, -0.304302439716851), calcium_zscore = c(1.09606450959036,
0.449665953945447, NA, -0.674500004496664, -0.33725, -0.674500004496664
), uc_ratio_zscore = c(0.691189771122184, 0.00395552487310546,
NA, -0.955924044178282, -0.545585328858177, -0.54077986726889
), sodium_zscore = c(0.932489252756058, -0.6745, NA, -1.180375,
0.310829750918686, -1.01175), phos_zscore = c(-0.103769544771059,
1.21409991456333, NA, 1.39396640945733, -1.93270290026917, -1.30403359054267
), pot_zscore = c(1.07919974530892, 0.134899228372, NA, -0.404700259007998,
1.21409890946892, -0.269801030635998), pcv_zscore = c(NA, NA,
-0.243018626530217, 1.2141, -0.959399470734058, -0.20235), glob_zscore = c(-1.079198972062,
0.385428307960541, NA, -0.963571690112316, -1.21409948738, -0.963571690112316
), baso_per_ul_zscore = c(NA, NA, NA, NA, NA, -1.2646875), esino_per_ul_zscore = c(NA,
NA, 2.0877380952381, -0.108792935519412, -0.21912328042328, 1.68625
), lympho_per_ul_zscore = c(NA, NA, -0.173182432432432, -1.11988412698413,
-0.525016216216216, 0.565295238095238), mono_per_ul_zscore = c(NA,
NA, 1.01941477272727, -0.332159846897352, 3.01532159090909, 0.844644122370989
), neutro_per_ul_zscore = c(NA, NA, -0.82978274474743, -0.560261015649008,
1.3692018328118, -0.69385232985532), cortisol_zscore = c(9.53660294508432,
8.6351129251796, NA, NA, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
My goal is to create columns called metab_index, imm_index, neuro_index, and dysreg_index. These are indices of 'dysregulation,' whereas any analyte that is found to be in dysregulation based on certain rules (see code below) gets a '1'. Then, the metab_index, for instance, is the # of metabolic analytes that are in dysregulation (see below code for rules of dysregulation), divided by the total # of metabolic analytes available (available meaning there is not an NA for that individual). Same goes for imm_index and neuro_index. The dysreg_index includes ALL of the analytes, not divided by physiologic system. My expected output for all "_index" variables is a number between 0 (meaning no analytes are in dysreg.) and 1 (all analytes are in dysreg).
The analytes that go into each index are: metab_index: includes alt_zscore, alb_zscore, alp_zscore, calcium_zscore, uc_ratio_zscore, sodium_zscore, phos_zscore, pot_zscore, pcv_zscore, and glob_zscore if it is too low.
imm_index: includes glob_zscore if it is too high, baso_per_ul_zscore, esino_per_ul_zscore, lympho_per_ul_zscore, mono_per_ul_zscore, and neutro_per_ul_zscore.
neuro_index: includes only cortisol_zscore.
dysreg_index: includes all of the above.
The problem arises that I have one analyte, glob_zscore, where too high of a value means it needs to be counted towards the imm_index, and too low of a value means it needs to be counted towards the metab_index. I can assign values of glob_zscore to metab_index or imm_index just fine, but my problem arises in trying to assign glob_zscore to the total dysreg_index. If glob_zscore is either too high OR too low, it should be counted towards the numerator of the dysreg_index. The dysreg_index should be a number between 0-1, but as my code is now, I am getting numbers that are >1.
Here is my code to hopefully show what I mean:
funcs <- list(
alt_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)), #lower (20%) or upper (80%) quintile is dysfunction
alb_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)), #lower (20%) or upper (80%) quintile is dysfunction
alp_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)), #lower (20%) or upper (80%) quintile is dysfunction
calcium_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)), #lower (20%) or upper (80%) quintile is dysfunction
uc_ratio_zscore = function(z) !is.na(z) & z < quantile(z, 0.25, na.rm = TRUE), # lower quartile (25%) is dysregulation
sodium_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)), #lower (20%) or upper (80%) quintile is dysfunction
phos_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)), #lower (20%) or upper (80%) quintile is dysfunction
pot_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)), #lower (20%) or upper (80%) quintile is dysfunction
pcv_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE), #upper quartile (75%) is dysregulation
glob_zscore = function(z) !is.na(z) & z < quantile(z, 0.25, na.rm = TRUE), # GLOB: lower quartile (25%) is dysregulation for METABOLIC
glob_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE), # GLOB: upper quartile (75%) is dysregulation for IMMUNE
baso_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE), #upper quartile (75%) is dysregulation
esino_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE), #upper quartile (75%) is dysregulation
lympho_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE), #upper quartile (75%) is dysregulation
mono_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE), #upper quartile (75%) is dysregulation
neutro_per_ul_zscore = function(z) !is.na(z) & z > quantile(z, 0.75, na.rm = TRUE), #upper quartile (75%) is dysregulation
cortisol_zscore = function(z) !is.na(z) & !between(z, quantile(z, 0.20, na.rm = TRUE), quantile(z, 0.80, na.rm = TRUE)) #lower (20%) or upper (80%) quintile is dysfunction
)
mapply(function(fn, x) fn(x), funcs, df[names(funcs)])
df <- df %>%
mutate(
metab_index = { ## METABOLIC INDEX
numerator <- mapply(function(fn, x) fn(x), funcs[1:10], pick(all_of(names(funcs[1:10])))) # order of funcs matters !
denominator <- (!is.na(pick(all_of(names(funcs[1:10]))))) # denominator is the number of non-NA elements in a row
rowSums(numerator) / rowSums(denominator)
}
) %>%
mutate(
imm_index = { ## IMMUNE INDEX
numerator <- mapply(function(fn, x) fn(x), funcs[11:16], pick(all_of(names(funcs[11:16])))) # order of funcs matters !
denominator <- (!is.na(pick(all_of(names(funcs[11:16]))))) # denominator is the number of non-NA elements in a row
rowSums(numerator) / rowSums(denominator)
}
) %>%
mutate(
neuro_index = { ## NEUROENDOCRINE INDEX
numerator <- mapply(function(fn, x) fn(x), funcs[17], pick(all_of(names(funcs[17])))) # order of funcs matters !
denominator <- (!is.na(pick(all_of(names(funcs[17]))))) # denominator is the number of non-NA elements in a row
rowSums(numerator) / rowSums(denominator)
}
) %>%
mutate(
dysreg_index = { ## TOTAL DYSREGULATION INDEX (includes metabolic, immune, and neuroendocrine)
numerator <- mapply(function(fn, x) fn(x), funcs, pick(all_of(names(funcs))))
denominator <- (!is.na(pick(all_of(names(funcs))))) # denominator is the number of non-NA elements in a row
rowSums(numerator) / rowSums(denominator)
}
)
The bug in my code is somewhere in the dysreg_index argument. My expected output is a number between 0-1, but I am getting numbers >1 at times. I know this is because I have technically two functions for glob_zscore, which likely inflates the # of functions available to pick from in the 'numerator', but I do it this way because I want to assign one function to the metab_index, and the other to the imm_index.
Any help is greatly appreciated. Thank you so much!