0

So I have some data which looks like this:

DATE        GROUP      Value    Visitors
2021-01-01  Treatment  12       40
2021-01-01  Control    4        43
2021-01-02  Treatment  7        34
2021-01-02  Control    2        39
2021-01-03  Treatment  10       23
2021-01-03  Control    10       29
2021-01-04  Treatment  19       30
2021-01-04  Control    7        23

If you sum up all this data the final results at the end of the experiment is

Group       Value Visitors Conversion (Value/Size)
Control     23    134      .172
Treatment   48    127      .378

So I need to calculate the p-value AND confidence interval of this data (namely conversion), not just at the end but over the course of the experiment, using a t-test.

What I am looking for here is a line graph which plots how the p-value changes cumulatively over time. I cant really think of any way to plot confidence interval over time so a table of daily confidence interval growth would suffice

John Thomas
  • 1,075
  • 9
  • 32
  • 1
    I am having trouble understanding what should be the “p-value” (or a confidence interval) at each time. Can you show some proposed calculations? – IRTFM Aug 27 '21 at 03:43

1 Answers1

1

is that what you look for ?

df <- read.table(textConnection('DATE        GROUP      Value    Visitors
2021-01-01  Treatment  12       40
2021-01-01  Control    4        43
2021-01-02  Treatment  7        34
2021-01-02  Control    2        39
2021-01-03  Treatment  10       23
2021-01-03  Control    10       29
2021-01-04  Treatment  19       30
2021-01-04  Control    7        23'),header=T)

library(tidyverse)
library(gridExtra)

new_df <- df %>%
mutate(Conversion=Value/Visitors) %>%
group_by(DATE,GROUP) %>%
summarise(Cumulative_Conversion=cumsum(Conversion),.groups='drop') %>%
group_by(DATE) %>%
summarise(P.Value=t.test(Cumulative_Conversion)$p.value,
          Conf.Int=t.test(Cumulative_Conversion)$conf.int,
          Mean=mean(Cumulative_Conversion),.groups='drop') 

new_df %>%
ggplot(aes(x=DATE,y=P.Value,fill=P.Value))+
geom_col() -> plot1 

new_df %>%
ggplot(aes(x=DATE,y=Mean,fill=Mean))+
geom_col()+
geom_point(aes(x=DATE,y=Conf.Int))+
geom_line(aes(x=DATE,y=Conf.Int)) -> plot2


final_plot <-grid.arrange(plot1,plot2)

final_plot

enter image description here

Samet Sökel
  • 2,515
  • 6
  • 21
  • this looks on the money, however can this be a line plot? I imagine date on the x-axis and p-value on the y-axis. And the p-value is ploted over time as the new days data is added to the cumulative count – John Thomas Aug 27 '21 at 14:14