Base R hist()
function uses the Sturges method to calculate the optimal number of bins, unlike ggplot2::geom_histogram
. There is a short tutorial showing how to replicate the Sturges method using ggplot2::geom_histogram
:
https://r-charts.com/distribution/histogram-sturges-ggplot2/
The reprex is as follows, and works as expected:
# install.packages("ggplot2")
library(ggplot2)
# Data
set.seed(3)
x <- rnorm(450)
df <- data.frame(x)
# Calculating the Sturges bins
breaks <- pretty(range(x),
n = nclass.Sturges(x),
min.n = 1)
df$breaks <- breaks
# Histogram with Sturges method
ggplot(df, aes(x = x)) +
geom_histogram(color = 1, fill = "white",
breaks = breaks) +
ggtitle("Sturges method")
Created on 2022-09-01 by the reprex package (v2.0.1)
However, when tried on my own data, it didn't work. It seems that the reprex above capitalizes on chance to work, because if the size of the dataframe is changed, then the same error arises:
# install.packages("ggplot2")
library(ggplot2)
# Data
set.seed(3)
x <- rnorm(400)
df <- data.frame(x)
# Calculating the Sturges bins
breaks <- pretty(range(x),
n = nclass.Sturges(x),
min.n = 1)
df$breaks <- breaks
#> Error in `$<-.data.frame`(`*tmp*`, breaks, value = c(-2.5, -2, -1.5, -1, : replacement has 14 rows, data has 400
Created on 2022-09-01 by the reprex package (v2.0.1)
How can I make this solution generalizable to all datasets, just like in the base R function?
Edit: looking for an automatized solution for use in a function, so can't set breaks manually.