How to plot Two Sample t.test() means, medians, and CI in R?

Question

With the data I have, this R code x <- t.test(Age ~ Completers, var.equal = TRUE, data = data) renders the following result:

    Two Sample t-test

data:  Age by Completers
t = 0.93312, df = 1060, p-value = 0.351
alternative hypothesis: true difference in means between group Completers and group Non Completers is not equal to 0
95 percent confidence interval:
 -0.5844018  1.6442118
sample estimates:
    mean in group Completers mean in group Non Completers
                    37.16052                     36.63062

What I would like is to plot each mean (found in x$estimate[1] and x$estimate[2]) with its own point on the x axis at its proper height on the y axis (on the same graph) and each point complemented with the same confidence interval (CI) (found in x$conf.int[1] and x$conf.int[2]). Like this[*]:

Unfortunately, if I'm not mistaken, plot() (from the Generic X-Y Plotting) does not seem to handle this. So I tried with plotCI (from gplots) as follows:

library(gplots)

plotCI(x = x$estimate[1], y = x$estimate[2], 
       li = x$conf.int[1], ui = x$conf.int[2])

But it renders as shown below:

My questions:

Is there a way to obtain a plot such as in the first graph with Base R code?
If not, what would be the solution (short of using the jmv:: code (see [*]))?

EDIT

As requested in the comments, please find hereunder some code that help reproduce the data (T-Test results are won't be exactly the same as above, but the idea is the same):

# Generate random numbers with specific mean and standard deviation
completers <- data.frame(Completers = 1, 
                         Age = rnorm(100, mean = 37.16052, sd = 8.34224))
nonCompleters <- data.frame(Completers = 0, 
                            Age = rnorm(100, mean = 36.63062, sd = 11.12173))

# Convert decimaled number to integers
completers[] <- lapply(completers, as.integer)
nonCompleters[] <- lapply(nonCompleters, as.integer)

# Stack data from 2 different data frames
df <- rbind(completers, nonCompleters)

# Remove useless data frames
rm(completers, nonCompleters)

# Age ~ Completers (T-Test)
(tTest <- t.test(df$Age ~ df$Completers, var.equal = TRUE))

Sources:

Generate random numbers with specific mean and standard deviation (Scroll down until "From Normal Distribution")
Convert decimaled number to integers
Stack data from 2 different data frames

[*] Graph obtained with Jamovi Version 2.3.15.0 which uses the following code (but I would like to avoid using jmv::):

jmv::ttestIS(
  formula = Age ~ Completers,
  data = data,
  plots = TRUE
)

System used:

R 4.2.1
RStudio 2022.07.1 Build 554
macOS Monterey Version 12.5.1 (Intel)

Could you please share some reproducible data using `dput` and your code? — Quinten, Sep 03 '22 at 12:01

jpsmith · Accepted Answer · 2022-09-06T12:42:57.950

There appears to be a misalignment of what you want and what t.test() is giving you. t.test() let you know if there is a difference in means, and report the CI of the difference in sample means (not the CIs of the individual means).

Since you stated you want the CIs of the individual means using base R, you can accomplish this by:

Sample data

nn <- 100
df <- data.frame(Completers = rep(c(1,0), each = nn),
                 Age = c(as.integer(rnorm(nn, mean = 37.16052, sd = 8.34224)),
                         as.integer(rnorm(nn, mean = 36.63062, sd = 11.12173))))

With the raw data, calculate the summary statistics and confidence interval:

# Base R - find summary statistics and restructure into data frame
df_summary <- aggregate(Age ~ Completers, df, function(x) c(mean = mean(x), 
                                                            sd = sd(x), 
                                                            median = median(x), 
                                                            n = length(x)))
df_summary <- data.frame(Completers = df_summary[, 1], df_summary$Age) #reformat nested matrix

# Calculate 95% CI
alpha <- 0.05/2

# Lower CI
df_summary$ci_low <-
  df_summary$mean - qt(1 - alpha, df = df_summary$n) * df_summary$sd /
  sqrt(df_summary$n)

# Upper CI
df_summary$ci_hi <-
  df_summary$mean + qt(1 - alpha, df = df_summary$n) * df_summary$sd /
  sqrt(df_summary$n)

# Output

#  Completers  mean        sd median   n   ci_low    ci_hi
#1          0 34.94 10.730698     34 100 32.81106 37.06894
#2          1 37.43  7.645234     37 100 35.91321 38.94679

Now you can plot the mean and CI for each group (your example also mentioned you wanted the median in there):

# Set Y limits (change to whatever)
ylimits <- c(min(df_summary$ci_low) - 1,
             max(df_summary$ci_hi) + 1)

# Plot
plot(NA, xlim = c(0,3), ylim = ylimits, # blank plot
     axes = FALSE, xlab = "", ylab = "")
segments(x0 = c(1,2), y0 = df_summary$ci_low, y1 = df_summary$ci_hi) # add segments
points(df_summary$mean, pch = 19) # add means
points(df_summary$median, pch = 0)
axis(1, at = 0:3, labels = c(NA, "Completers", "Noncompleters", NA)) # add x axis
axis(2) #add y axis
mtext(side = 1, "Completers", padj = 4) # add x label 
mtext(side = 2, "Age", padj = -4) # add y label
legend("topleft", c("Mean", "Median", "95% CI"),
       pch = c(19, 0, NA), lty = c(NA, NA, 1), bty = "n")

Output:

Thank you very much for your answer. Unfortunately, I fail to generalise your solution to the data I have to extract from the ```t.test()```. Now that I added some `R` code that allow reproduction of the results, perhaps you could adapt your answer? (I realise that I should have put reproducible data immediately. Sorry for that.) — pdeli, Sep 05 '22 at 16:55
I'm a bit confused why you want to use `tTest$conf.int` for the CI for both, this is the CI for the difference between mean_0 and mean_1 (sample difference in means, or ` tTest$estimate[1] - tTest$estimate[2]`), not the CI for the individual means. Are you wanting the CI for the individual means? — jpsmith, Sep 06 '22 at 01:42
Thank you for your comment. Yes, what I want are the CI for the individual means (if possible). In effect, what I would like is the equivalent of the plot provided by the ```jmv::ttestIS()``` code, but without using that specific code, rather, the base ```R``` code, if possible. If not I'll go with additional packages too. — pdeli, Sep 06 '22 at 09:18
Thank you very much. It works like a charm. I have to ask the question though: would there have been an easier/shorter way of having the same result with some additional packages? — pdeli, Sep 07 '22 at 15:41

How to plot Two Sample t.test() means, medians, and CI in R?

1 Answers1

Linked