Questions tagged [histogram]

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data.

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size.

Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram used for probability density is always normalized to 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

In scientific software for statistical computing and graphics, The function hist generates a histogram. It can also optionally scale it so that its total area is 1. This puts it in the right scale if one want to overlay a probability density curve.

More about it here : histogram wiki

6663 questions
90
votes
1 answer

Python histogram outline

I have plotted a histogram in Jupyter (Python 2) and was expecting to see the outlines of my bars but this is not the case. I'm using the following code: import matplotlib.pyplot as plt from numpy.random import normal gaussian_numbers =…
Brad Reed
  • 1,049
  • 1
  • 7
  • 6
89
votes
3 answers

Matplotlib - label each bin

I'm currently using Matplotlib to create a histogram: import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as pyplot ... fig = pyplot.figure() ax = fig.add_subplot(1,1,1,) n, bins, patches = ax.hist(measurements, bins=50,…
victorhooi
  • 16,775
  • 22
  • 90
  • 113
85
votes
4 answers

Overlay normal curve to histogram in R

I have managed to find online how to overlay a normal curve to a histogram in R, but I would like to retain the normal "frequency" y-axis of a histogram. See two code segments below, and notice how in the second, the y-axis is replaced with…
StanLe
  • 5,037
  • 9
  • 38
  • 41
80
votes
7 answers

Histogram with Logarithmic Scale and custom breaks

I'm trying to generate a histogram in R with a logarithmic scale for y. Currently I do: hist(mydata$V3, breaks=c(0,1,2,3,4,5,25)) This gives me a histogram, but the density between 0 to 1 is so great (about a million values difference) that you can…
Weegee
  • 2,225
  • 1
  • 17
  • 16
77
votes
4 answers

How to use log scale with pandas plots

I'm making a fairly simple histogram with pandas using results.val1.hist(bins=120) which works fine, but I really want to have a log scale on the y axis, which I normally (probably incorrectly) do like this: fig = plt.figure(figsize=(12,8)) ax =…
TristanMatthews
  • 2,451
  • 4
  • 24
  • 34
77
votes
2 answers

Matplotlib/Pandas error using histogram

I have a problem making histograms from pandas series objects and I can't understand why it does not work. The code has worked fine before but now it does not. Here is a bit of my code (specifically, a pandas series object I'm trying to make a…
jonas
  • 13,559
  • 22
  • 57
  • 75
73
votes
5 answers

add title to collection of pandas hist plots

I'm looking for advice on how to show a title at the top of a collection of histogram plots that have been generated by a pandas df.hist() command. For instance, in the histogram figure block generated by the code below I'd like to place a general…
dreme
  • 4,761
  • 3
  • 18
  • 20
69
votes
4 answers

Understanding dates and plotting a histogram with ggplot2 in R

Main Question I'm having issues with understanding why the handling of dates, labels and breaks is not working as I would have expected in R when trying to make a histogram with ggplot2. I'm looking for: A histogram of the frequency of my…
Hendy
  • 10,182
  • 15
  • 65
  • 71
67
votes
5 answers

Using Counter() in Python to build histogram?

I saw on another question that I could use Counter() to count the number of occurrences in a set of strings. So if I have ['A','B','A','C','A','A'] I get Counter({'A':3,'B':1,'C':1}). But now, how can I use that information to build a histogram for…
marc
  • 2,037
  • 9
  • 24
  • 32
66
votes
5 answers

Plot a histogram from a Dictionary

I created a dictionary that counts the occurrences in a list of every key and I would now like to plot the histogram of its content. This is the content of the dictionary I want to plot: {1: 27, 34: 1, 3: 72, 4: 62, 5: 33, 6: 36, 7: 20, 8: 12, 9: 9,…
Matteo
  • 7,924
  • 24
  • 84
  • 129
65
votes
4 answers

Overlay histogram with density curve

I am trying to make a histogram of density values and overlay that with the curve of a density function (not the density estimate). Using a simple standard normal example, here is some data: x <- rnorm(1000) I can do: q <- qplot( x,…
Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131
64
votes
6 answers

seaborn distplot / displot with multiple distributions

I am using seaborn to plot a distribution plot. I would like to plot multiple distributions on the same plot in different colors: Here's how I start the distribution plot: import numpy as np import pandas as pd from sklearn.datasets import…
Trexion Kameha
  • 3,362
  • 10
  • 34
  • 60
64
votes
4 answers

Extract data from a ggplot

I have made a plot using ggplot2 geom_histogram from a data frame. See sample below and link to the ggplot histogram Need to label each geom_vline with the factors using a nested ddply function and facet wrap I now need to make a data frame that…
George
  • 1,343
  • 2
  • 12
  • 17
61
votes
5 answers

Fitting a histogram with python

I have a histogram H=hist(my_data,bins=my_bin,histtype='step',color='r') I can see that the shape is almost gaussian but I would like to fit this histogram with a gaussian function and print the value of the mean and sigma I get. Can you help me?
Brian
  • 13,996
  • 19
  • 70
  • 94
61
votes
1 answer

How to add vertical lines to a distribution plot

Using the examples from seaborn.pydata.org and the Python DataScience Handbook, I'm able to produce a combined distribution plot with the following snippet: Code: import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot…
vestland
  • 55,229
  • 37
  • 187
  • 305