2

I have a large data set of the following format: First column is type, and the subsequent columns are different times that 'type' happens. I want to calculate the slope of each row (~7000 rows) for subset T0-T2 and then t0-t2 and output that information, then get the average of the slopes for each row. For example, get the slope of subset T0-T2 and of t0-t2 for type1 then get an average of the two values for row type1. Some rows have missing data entirely, while some are missing either a single or two values.

Type    T0   T1   T2   t0   t1   t2  
type1  0.2  0.3  0.4  0.3  0.2  0.1 
type2  1.4  2.5  3.4  1.5  0.5  3.4
type3  0.4  8.1  8.1       2.2
type4        
...

I am a beginner in R, and so it has been challenging trying to do this, as much as in my head it seems simple. I am getting errors in the missing values (NA) and I will appreciate any ideas, or directions to a similar question on this site. Thanks

thelatemail
  • 91,185
  • 12
  • 128
  • 188
Anita
  • 45
  • 2
  • 4
  • What do you mean by 'slope of each row'? As ~7000 rows is not really big, you can use the `apply` function family in R. Simply type in `?apply`. If your `data.frame` is `x`, you want something like `apply(x[,-1], 1, mean, na.rm=T)`. This means apply the `mean` function with extra argument `na.rm=TRUE` to every row (`1`) of `x` without the first column `Type` etc. – Marco Breitig Feb 26 '15 at 07:51

2 Answers2

4

First, you might want to write a function that can calculate the slope for three consecutive values, like this:

slope  <-  function(x){
    if(all(is.na(x)))
        # if x is all missing, then lm will throw an error that we want to avoid
        return(NA)
    else
        return(coef(lm(I(1:3)~x))[2])
}

Then you can use the apply() function to calculate the slope for each row (MARGIN = 1) like so:

df <- read.csv(text = 
"Type,T0,T1,T2,t0,t1,t2
type1,0.2,0.3,0.4,0.3,0.2,0.1 
type2,1.4,2.5,3.4,1.5,0.5,3.4
type3,0.4,8.1,8.1,,2.2,")


df$slope1  <-  
    apply(df[,c('T0','T1','T2')],
          1,
          slope)

df$slope2  <-  
    apply(df[,c('t0','t1','t2')],
          1,
          slope)

then calculate the average slope:

df$average.slope  <-  (df$slope1 + df$slope2)/2
Jthorpe
  • 9,756
  • 2
  • 49
  • 64
0

You can get the slopes for each row doing something like:

#dat <- read.table(text="Type    T0   T1   T2   t0   t1   t2  
#type1  0.2  0.3  0.4  0.3  0.2  0.1 
#type2  1.4  2.5  3.4  1.5  0.5  3.4
#type3  0.4  8.1  8.1   NA  2.2   NA",header=TRUE)

tapply(
  dat[c("T0","T1","T2")],
  dat["Type"],
  FUN=function(x) 
    coef(lm(unlist(x) ~ seq_along(x)))[-1]
)

#Type
#type1 type2 type3 
# 0.10  1.00  3.85 
thelatemail
  • 91,185
  • 12
  • 128
  • 188