I am trying to generate a table of regression slopes generated by a custom function based on the mblm package (the function in the example here is a simplified version). The function requires a formula as argument and I would like to use dplyr summarise to apply it to grouped samples from a large data frame with many variables. The output should be a tibble of regression slopes for sample groups and response variables that I can pass to a heatmap function.
library (dplyr)
# Example data
test_data <-
rbind (
data.frame(ID=paste0("someName", c(1:9)), Sample_Type="type1",
A=seq(1,17, length.out=9),
I=0.1^seq(1,1.8,length.out=9),
J=1-0.1^seq(1,1.8,length.out=9)),
data.frame(ID=paste0("someName", c(10:15)), Sample_Type="type2",
A=seq(1,7, length.out=6),
I=0.1^(1-seq(1,1.5,length.out=6)),
J=1-0.1^(1-seq(1,1.5,length.out=6))))
# Define an independent and the responding variables - I would like to be able to easily test different independent variables
idpVar <- "A"
respVar <- test_data %>% .[!names(.) %in% c("ID", "Sample_Type", idpVar)] %>% names()
# Custom function generating numeric value of median slopes (simplified from mblm)
medianSlope <-
function (formula, dataframe)
{
if (missing(dataframe))
dataframe <- environment(formula)
term <- as.character(attr(terms(formula), "variables")[-1])
x = dataframe[[term[2]]]
y = dataframe[[term[1]]]
if (length(term) > 2) {
stop("Only linear models are accepted")
}
xx = sort(x)
yy = y[order(x)]
n = length(xx)
slopes = c()
smedians = c()
for (i in 1:n) {
slopes = c()
for (j in 1:n) {
if (xx[j] != xx[i]) {
slopes = c(slopes, (yy[j] - yy[i])/(xx[j] -
xx[i]))
}
}
smedians = c(smedians, median(slopes))
}
slope = median(smedians)
slope
}
# Custom function works with test dataframe and a single named dependent variable but "group_by" seems to be ignored:
test_data %>% group_by (Sample_Type) %>% medianSlope( formula(paste("J", "~", idpVar)) ,.)
Leaving the grouping issue aside for the moment, I tried to make "summarise" work by generating a list of multiple formulas:
paste(respVar, "~", idpVar) [1] "B ~ A" "C ~ A" "D ~ A" "E ~ A" "F ~ A" "G ~ A" "H ~ A" "I ~ A" "J ~ A" "K ~ A" "L ~ A"
However
test_data %>% summarise_at (respVar, medianSlope(paste(respVar, "~", idpVar), .))
Error: $ operator is invalid for atomic vectors
test_data %>% summarise_at (respVar, medianSlope(paste(get(respVar), "~", get(idpVar)), .))
Error in get(idpVar) : object 'A' not found
I am relatively new to R and a bit lost. Can you help?
However, the grouping of variables is still ignored. I have edited test_data and example code to make it clearer what I meant to achieve. – ThomasW Aug 19 '20 at 16:05