0

I would like to calculate the area under curve for a time series for multiple samples. the time variables of the data type POSIXlt

my data is set up like this

day = c(rep(1, 4), rep(2,4))
time = c("2016-11-10 11:40:42", 
     "2016-11-10 11:45:42", 
     "2016-11-10 11:50:42", 
     "2016-11-10 11:55:42", 
     "2016-11-11 11:40:42", 
     "2016-11-11 11:45:42", 
     "2016-11-11 11:50:42", 
     "2016-11-11 11:55:42")
time = as.POSIXlt(time)
value = runif(8, min=4, max=20)
combined = data.frame(day, time, value)

  day                time     value
1   1 2016-11-10 11:40:42 10.726758
2   1 2016-11-10 11:45:42 14.123989
3   1 2016-11-10 11:50:42 12.145620
4   1 2016-11-10 11:55:42  7.254183
5   2 2016-11-11 11:40:42  8.385879
6   2 2016-11-11 11:45:42 16.411480
7   2 2016-11-11 11:50:42  4.640858
8   2 2016-11-11 11:55:42 17.300498

I would like to calculate the AUC for each individual day the series. I have a large data set with may days data. the times are in sequential order already (it is a continuous measurement over may days)

ideally I would like the output to be:

day  AUC 
1    x
2    x        
etc....  

any help much appreciated.

MLyall
  • 119
  • 9
  • Please `dput()` your data. Hover your pointer over the `r` tag for more info. – Hack-R Nov 11 '16 at 20:52
  • @Hack-R ok thanks. that should be a reproducible example now – MLyall Nov 11 '16 at 21:08
  • Its not clear to me if you want the area under the time-series (like a cumulative sum or a definite integration) or the area under the receiver operating curve (ROC). At present you have an answer for both, very different alternatives. – vincentmajor Nov 12 '16 at 01:57

2 Answers2

1

I don't know if you want to calculate the mean of the day, or the sum ... but you can change this code to your own needs:

df$day <- as.Date(df$day)

df %>% 
  group_by(day) %>% 
    summarise(AUC = mean(value))
J_F
  • 9,956
  • 2
  • 31
  • 55
1

Do you have predictions and outcomes? I generated an example assuming that you were missing those columns

# install.packages("ModelMetrics")
library(ModelMetrics)
library(dplyr)

day = c(rep(1, 4), rep(2,4),)
time = c("2016-11-10 11:40:42", 
     "2016-11-10 11:45:42", 
     "2016-11-10 11:50:42", 
     "2016-11-10 11:55:42", 
     "2016-11-11 11:40:42", 
     "2016-11-11 11:45:42", 
     "2016-11-11 11:50:42", 
     "2016-11-11 11:55:42")
time = as.POSIXlt(time)
outcome = as.numeric(runif(8, min=0, max=1) > .5)
predictions = runif(8, min=0, max=1)
combined = data.frame(day, time, outcome, predictions)

combined %>%
  group_by(day) %>%
  summarise(
    Predictions = n()
    ,AUCs = auc(outcome, predictions)
  )
JackStat
  • 1,593
  • 1
  • 11
  • 17