4

I have the following data:

 df <-
Session    Volume                    StartTime                 EndTime
1         27,75          2016-01-22 17:00:33.707  2016-01-27 06:02:54.900
2         10,78          2016-01-22 14:31:22.127  2016-01-23 15:01:20.997
3         15,88          2016-01-27 12:46:18.660  2016-01-27 15:01:23.250
4         46,10          2016-01-25 16:01:34.613  2016-01-25 21:46:35.477
5         94,60          2016-01-27 05:38:06.597  2016-01-27 06:08:06.027
6         15,93          2016-01-20 16:15:59.350  2016-01-21 06:06:43.933

I want to add the data for each day ( on Starttime) to receive a dataset with the total volume of all sessions for each day, so that I can plot the volume over time. E.g. on the 22th of januari 2016 a total of 38.53 has charged. So that

dfnew <- 
 Day            TotalVolume
2016-01-22    38.53
2016-01-25    46.10
2016-01-27    110.48
 etc. 

What is the most efficient way to do this?

ima
  • 155
  • 12

2 Answers2

1

With data table

library(data.table)
df[,StartTime := as.POSIXct(StartTime)]
df[,sum(Volume), by = as.Date(df$StartTime)]


      as.Date     V1
1: 2016-01-22  38.53
2: 2016-01-27 110.48
3: 2016-01-25  46.10
4: 2016-01-20  15.93

and with dplyr

library(dplyr)
df %>%
  mutate(StartTime = as.POSIXct(StartTime)) %>%
  group_by(as.Date(StartTime)) %>%
  summarise(sum(Volume))

Here are the data:

df <- as.data.table(read.table(text = "
  Session;    Volume;                    StartTime;                 EndTime
1;         27,75;          2016-01-22 17:00:33.707;  2016-01-27 06:02:54.900
2;         10,78;          2016-01-22 14:31:22.127;  2016-01-23 15:01:20.997
3;         15,88;          2016-01-27 12:46:18.660;  2016-01-27 15:01:23.250
4;         46,10;          2016-01-25 16:01:34.613;  2016-01-25 21:46:35.477
5;         94,60;          2016-01-27 05:38:06.597;  2016-01-27 06:08:06.027
6;         15,93;          2016-01-20 16:15:59.350;  2016-01-21 06:06:43.933",header = T,sep = ";",dec = ","))
denis
  • 5,580
  • 1
  • 13
  • 40
  • Is there an easy way to change the code from sum per day to overal cummulative, thus instead of `dfnew <- Day TotalVolume 2016-01-22 38.53 2016-01-25 46.10 2016-01-27 110.48` to a result where `dfnew <- Day TotalVolume 2016-01-20 15.93 2016-01-22 54.46 2016-01-25 100.56 etc.` – ima Mar 07 '18 at 14:23
0

I prepared a small reproducible example. Is this what you would like?

library(lubridate)  
library(dplyr)
df=tibble(Volume=c(27.75,10.78,15.88,46.1,94.60,15.93),
          StartTime=c("2016-01-22 17:00:33.707","2016-01-22 14:31:22.127",
          "2016-01-27 12:46:18.660","2016-01-25 16:01:34.613","2016-01-27 05:38:06.597","2016-01-20 16:15:59.350"))



df <-df%>%
  mutate(StartTime=ymd_hms(StartTime))%>%
  mutate(StartTime=floor_date(StartTime,unit="day"))%>%
  group_by(StartTime)%>%dplyr::summarize(Volume=sum(Volume))

            > df
            # A tibble: 4 x 2
              StartTime           Volume
              <dttm>               <dbl>
            1 2016-01-20 00:00:00   15.9
            2 2016-01-22 00:00:00   38.5
            3 2016-01-25 00:00:00   46.1
            4 2016-01-27 00:00:00  110  
Antonios
  • 1,919
  • 1
  • 11
  • 18
  • Thanks Antonis, I received an error message: { Error in... could not find function mutate}, which I tried to remove by adding {library(dplyr)} to the solution -> but the error message Error in summarise_impl(.data, dots) appears. Do you know how to solve it? – ima Mar 06 '18 at 16:23
  • Hi @ima, I am not sure what the problem is as it runs ok in my PC. In any case I updated the answer be using `dplyr::summarize`. Hope it helps – Antonios Mar 06 '18 at 19:02