0

I have a dataset with 26 variables and 4662 observation over one year. I want to analyse differences which may occur after a specific date. There is the variable time which is 0 if it is before the date and 1 if it is after. Another variable categories my different types of observation.

I would like to examine if there are significant differences between each categories before and after the specific date. But the differences which I want to look at are saved in another variable number_trackers. c4 is a placeholder for all other unimprtant variables I wont need for this t.test

reproduceable Dataframe

Dataset <- data.frame = category=c("tools", "finance", "business", "education","tools","education"), 
number_trackers = c(10, 12, 1, 30, 7, 21), 
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
time = c(1,0,0,0,1,1))

It would be best if the output would be a t-test for each category with the two different time periods.

Paul
  • 23
  • 5
  • You want to test differences in mean values of `number_trackers` per `time`? I am failing to see the relevance of `category`. – Rui Barradas Jun 23 '21 at 09:54
  • I would like to see the difference of the number_trackers per time for each category. For example the changes of the number of trackers in the category of tools differs in the two time periods. And then a t test for each of the other categories. – Paul Jun 23 '21 at 10:01
  • Do you want make loop so that each time it takes different category performs t-test on number of tractors? – Mohanasundaram Jun 23 '21 at 10:06
  • Yes, this sounds like a good Idea but i just know basics in r and not how to do a loop. – Paul Jun 23 '21 at 10:07
  • Check the edit in the answer – Mohanasundaram Jun 23 '21 at 10:38

1 Answers1

1

A loop with categories might help:

#taking the list of unique categories
categories <- unique(Dataset$category)

#Creating an empty list
output_list <- list()

#Lopping the t-test for different categories and creating a list of output
for (i in categories) {
  output_list[[i]] <- t.test(number_trackers ~ time, 
                             data = Dataset[Dataset$category == i,], 
                             paired = FALSE)
}

If you want to see the summary of the first category:

output_list[[categories[1]]]

Edit:

For generating a summary table of the output

sum_tab <- as.data.frame(matrix(nrow = length(categories), ncol = 7))
colnames(sum_tab) <- c("t", "df", "p.value", "ConfIntLower", 
                       "ConfIntUpper", "Mean in Gr 0", "Mean in Gr 1")
rownames(sum_tab) <- categories

for (i in categories) {
  sum_tab[i, ] <- with(output_list[[i]], 
                       c(statistic, parameter, p.value, conf.int, estimate))
}


write.csv(sum_tab, "Summary.csv", row.names = TRUE)

P.S.: Since the reproducible example is not sufficient, I couldn't run this to show the output.

Mohanasundaram
  • 2,889
  • 1
  • 8
  • 18