4

I'm creating a frequency plot using ggplot and the stat_ecdf function. I would like to add the Y-value to the graph for specific X-values, but just can't figure out how. geom_point or geom_text seems likely options, but as stat_ecdf automatically calculates Y, I don't know how to call that value in the geom_point/text mappings.

Sample code for my initial plot is:

x = as.data.frame(rnorm(100))
ggplot(x, aes(x)) + 
stat_ecdf()

Now how would I add specific y-x points here, e.g. y-value at x = -1.

Gerard
  • 159
  • 1
  • 2
  • 11

2 Answers2

5

The easiest way is to create the ecdf function beforehand using ecdf() from the stats package, then plot it using geom_label().

library(ggplot2)
# create a data.frame with column name
x = data.frame(col1 = rnorm(100))
# create ecdf function
e = ecdf(x$col1)

# plot the result
ggplot(x, aes(col1)) + 
  stat_ecdf() +
  geom_label(aes(x = -1, y = e(-1)), 
             label = e(-1))

enter image description here

mtoto
  • 23,919
  • 4
  • 58
  • 71
  • Excellent, thanks! That makes sense and seems fairly straight forward. Will try to execute on my actual data now. – Gerard Feb 27 '18 at 06:41
2

You can try

library(tidyverse)
# data
set.seed(123)
df = data.frame(x=rnorm(100))
# Plot
Values <- c(-1,0.5,2) 
df %>% 
  mutate(gr=FALSE) %>% 
  bind_rows(data.frame(x=Values,gr=TRUE)) %>% 
  mutate(y=ecdf(x)(x)) %>%  
  mutate(xmin=min(x)) %>% 
  ggplot(aes(x, y)) +
   stat_ecdf() +
   geom_point(data=. %>% filter(gr), aes(x, y)) + 
   geom_segment(data=. %>% filter(gr),aes(y=y,x=xmin, xend=x,yend=y), color="red")+
   geom_segment(data=. %>% filter(gr),aes(y=0,x=x, xend=x,yend=y), color="red") +
   ggrepel::geom_label_repel(data=. %>% filter(gr), 
                             aes(x, y, label=paste("x=",round(x,2),"\ny=",round(y,2)))) 

enter image description here

The idea is to add the y values in the beginning, together with the index gr specifing which Values you want to show.

Edit:

Since this code adds points to the actual data, which could be wrong for the curve, one should consider to remove these points at least in the ecdf function stat_ecdf(data=. %>% filter(!gr))

Roman
  • 17,008
  • 3
  • 36
  • 49