I have a numeric, a count, and an over-dispersed count large matrices:
set.seed(1)
numeric.mat <- matrix(rnorm(10000*6000),10000,6000)
count.mat <- matrix(rpois(10000*6000,10),10000,6000)
dispersed.count.mat <- matrix(rnegbin(10000*6000,10,2),10000,6000)
And one corresponding factors data.frame
(can be a matrix
too):
factors.df <- data.frame(f1 = sample(LETTERS[1:3], 10000, replace = T),
f2 = sample(LETTERS[4:5], 10000, replace = T))
The number of factors is pretty small (in this case only 2 but won't be more than 5 for real data), and the number of levels in each (they're all categorical) is also small (also up to 5).
I'd like to obtain the residuals
for fitting a linear
, poisson
, and negative binomial
regression models to each of the columns in each of the matrices, respectively.
So for a single column:
data.df <- factors.df %>%
dplyr::mutate(numeric.y = numeric.mat[,1],
count.y = count.mat[,1],
dispersed.count.y = dispersed.count.mat[,1])
I'd use:
lm(numeric.y ~ f1+f2, data = data.df)$residuals
residuals(object = glm(count.y ~ f1+f2, data = data.df, family = "poisson"), type = 'pearson')
residuals(object = glm.nb(formula = model.formula, data = regression.df), type = 'pearson')
For the three regression models.
Is there a faster way of obtaining these residuals other than, for example, using do.call, for each. E.g.:
do.call(cbind,
lapply(1:ncol(numeric.mat),
function(i)
lm(numeric.y ~ f1+f2,
data = dplyr::mutate(factors.df,
numeric.y = numeric.mat[,i])
)$residuals
))