I would like to have a table as ouptut where I have the t-statistics between the difference in means of certain variables and based on two specific subsets of my data.
I have the following data:
structure(list(Name = c("A", "A", "A", "A", "B", "B", "B", "B",
"C", "C", "C", "C", "D", "D", "D", "D"), Date = c("20.10.2018",
"30.09.2018", "25.11.2019", "23.10.2020", "20.03.2018", "30.07.2018",
"25.08.2019", "23.10.2020", "20.12.2018", "30.01.2018", "25.02.2019",
"23.06.2020", "20.11.2018", "30.12.2018", "25.11.2019", "23.09.2020"
), Return = c(0.01, 0.05, 0.08, 0.07, 0.04, 0.03, 0.01, 0.03,
0.03, 0.05, 0.06, 0.07, 0.07, 0.04, 0.06, 0.08), Age = c(5L,
5L, 6L, 7L, 8L, 8L, 9L, 10L, 4L, 4L, 5L, 6L, 1L, 1L, 2L, 3L),
Size = c(53336L, 75768L, 86548L, 94567L, 40234L, 40240L,
50243L, 60352L, 5069L, 6069L, 7092L, 8024L, 2456L, 3046L,
4056L, 5600L), Rating = c(1L, 1L, 1L, 2L, 5L, 5L, 3L, NA,
4L, 5L, 4L, 5L, NA, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA,
-16L))
More specifically, I would like to have a table where I have t-statistics for each differences of means between the variables Return, Age and Size for the observations with a Rating of 1 and 5. The t-statistics should be the column between Rating 1 and Rating 5 and should include the stars that indicate the p-value.
I tried using the t.test function but I have difficulties using it with subgroups only and create the t-statistics column in the middle between Rating 1 and Rating 5.
The output should have the layout like this:
structure(list(c("Return", "Age", "Size"), `Mean Rating 1` = c(NA,
NA, NA), `t-statistics including p-value (indicated as stars)` = c(NA,
NA, NA), `Mean Rating 5` = c(NA, NA, NA)), class = "data.frame", row.names = c(NA,
-3L))
Could someone help me here with the code?
Thank you a lot in advance.
EDIT 22.04.2022:
Question 1: How would I need to adjust the code in the answer if I would like the output to be the following (there are no values now but just to illustrate the layout I would like to have):
structure(list(c("Return", "Age", "Size"), `Mean Rating 1` = c(NA,
NA, NA), `Mean Rating2` = c(NA, NA, NA), `Mean Rating 3` = c(NA,
NA, NA), `Mean Rating 4` = c(NA, NA, NA), `Mean Rating 5` = c(NA,
NA, NA), `Mean Rating NA` = c(NA, NA, NA), `Difference in means Rating 5 and Rating 1` = c(NA,
NA, NA), `p-value for differences in means Rating 5 and Rating 1` = c(NA,
NA, NA), `stars for p-value for differences in means Rating 5 and Rating 1` = c(NA,
NA, NA)), class = "data.frame", row.names = c(NA, -3L))
Question 2: When I want to compare the differences in means between two groups, is it better to use the t-test or F-test? I have chosen the t-test since as far as I know if I want to compare the means between two groups t-test is the right test. The F-test is better to use if I want to compare the two standard deviations of the two groups. Is my understanding right?