1

This is the first time I have a R question that I couldn't find on Stack Overflow already - forgive me if the reason why I didn't find anything is a specific term for the type of thing I'm looking for that I'm not aware of (is there?).

I'd like to display data as a cumulative frequency. Since my focus is more on the edges of the Distribution, it is helpful to scale the y-axis to a normal distribution. The result should look something like this: enter image description here

I've read about quantile-quantile plots, but honestly I can't figure out how to apply them if I want to preserve the X-axis.

I tried both base graphics and ggplot2, but can't figure it out. My current solution is therefore, for example

plot(ecdf(trees$Volume))

or

ggplot(data=trees, aes(Volume)) + stat_ecdf()
ursusminimus
  • 148
  • 1
  • 10
  • Perhaps you are looking for something like this: `plot(trees$Volume, cumsum(dnorm(trees$Volume, mean = mean(trees$Volume), sd = sd(trees$Volume))))`? – Gopala Apr 12 '16 at 12:52
  • Hmm, no, the solution doesn't work - the shape of the curve is approximately ok, but at the same time the y axis is still linear and the order of the values seems incorrect. Strange. – ursusminimus Apr 12 '16 at 14:09

2 Answers2

2

I think you are looking for the scales package and the probability_trans() function:

Without transforming the y scales:

require(ggplot2)

ggplot(data = trees,
       aes(Volume)) + 
    stat_ecdf()

enter image description here

With transformation of y axis:

ggplot(data = trees,
       aes(Volume)) + 
    stat_ecdf() + 
    scale_y_continuous(trans = scales::probability_trans("norm"))

enter image description here

You can read more about these in the documents with ?probability_trans. The probability_trans() function takes standard R probability names to scale your axis with. You can also create a new transformation with trans_new() if you need something completely custom.

niczky12
  • 4,953
  • 1
  • 24
  • 34
1

The qpplot.das function from the StatDA package by Peter Filzmoser might be a "base R" way for you.

library(StatDA) 
qpplot.das(trees$Volume, qdist = qnorm, xlab = "Volume", line = FALSE) 

output

The StatDA package was used for all calculations and graphics for the book Statistical Data Analysis Explained by Reimann, Filzmoser, Garret and Dutter. All R scripts are online, also examples for the QP plots.