2

I try to perform a simple lm() regression analysis on a data frame. Explicitly, I want to perform a regression analyses between the column names of the data frame and each row. My data frame looks like this:

d = data.frame(replicate(6,rnorm(6)))
colnames(d) = as.character(0:5)

However, my lm() does not work:

lm(d[1,]~colnames(d))
#Error in model.frame.default(formula = d[1, ] ~ colnames(d), drop.unused.levels = TRUE) : 
#invalid type (list) for variable 'd[1, ]'

I would very much appreciate if someone helps me to get this running. I have not used much the lm() function, yet.

I know that the lm() functions wants something in the format lm(columnA ~ columnB, data = mydata), so I tried to to build a data frame for my data before posting the question here:

cbind(d[1,],0:5)

This, however, does not drop the dimensions of d. No clue why. If one can answer this questions, too, even though more general about R understanding would help me big time.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Feliks
  • 154
  • 2
  • 9

1 Answers1

0

I have to make some assumptions on what you plan to do, as you are not actively clarifying it.

I am assuming you want a different, independent regression line for each row of your data frame. In other words, you have multiple response (one per row), but a common covariate:

x <- 1:ncol(d) - 1

Thus, you can do

fit <- lm(t(d) ~ x)

#Call:
#lm(formula = t(d) ~ x)

#Coefficients:
#             [,1]      [,2]      [,3]      [,4]      [,5]      [,6]    
#(Intercept)   0.23133   0.48307   0.07867   0.62308   0.71174   0.89866
#x             0.02964  -0.30077  -0.05160   0.06321  -0.17155  -0.43689

fit is not a standard "lm" object, but "mlm" (multiple linear models). The coefficient matrix you see above, has each column associated to each response.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • My R questions is solved. Thx @Zheyuan Li. However, Let me be a bit more specific about the analysis, I work in genetics and analyse mapping quality per species. In detail, I measure the amount of mismatches for each read (given by column names) when mapping to a given species (row's). I simply want to use lm() to test for a negative relationship (number of mapped reads decrease for increasing mismatches). Thus, the observation for each species are not independent. – Feliks Sep 26 '16 at 21:27