Questions tagged [histogram]

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data.

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size.

Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram used for probability density is always normalized to 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

In scientific software for statistical computing and graphics, The function hist generates a histogram. It can also optionally scale it so that its total area is 1. This puts it in the right scale if one want to overlay a probability density curve.

More about it here : histogram wiki

6663 questions
61
votes
5 answers

Normalizing y-axis in histograms in R ggplot to proportion

I have a very simple question causing me to bang my head on the wall. I would like to scale the y-axis of my histogram to reflect the proportion (0 to 1) that each bin makes up, instead of having the area of the bars sum to 1, as using y=..density..…
First Last
  • 629
  • 1
  • 6
  • 7
60
votes
4 answers

Logarithmic y-axis bins in python

I'm trying to create a histogram of a data column and plot it logarithmically (y-axis) and I'm not sure why the following code does not work: import numpy as np import matplotlib.pyplot as plt data = np.loadtxt('foo.bar') fig = plt.figure() ax =…
mannaroth
  • 1,473
  • 3
  • 17
  • 38
59
votes
2 answers

Plot CDF + cumulative histogram using Seaborn

Is there a way to plot the CDF + cumulative histogram of a Pandas Series in Python using Seaborn only? I have the following: import numpy as np import pandas as pd import seaborn as sns s = pd.Series(np.random.normal(size=1000)) I know I can plot…
Michael
  • 1,834
  • 2
  • 20
  • 33
59
votes
6 answers

Make Frequency Histogram for Factor Variables

I am very new to R, so I apologize for such a basic question. I spent an hour googling this issue, but couldn't find a solution. Say I have some categorical data in my data set about common pet types. I input it as a character vector in R that…
OnlyDean
  • 1,025
  • 1
  • 13
  • 25
58
votes
3 answers

changing default x range in histogram matplotlib

I would like to change the default x range for the histogram plot. The range of the data is from 7 to 12. However, by default the histogram starts right at 7 and ends at 13. I want it to start at 6.5 and end at 12.5. However, the ticks should go…
Rohit
  • 5,840
  • 13
  • 42
  • 65
57
votes
8 answers

Comparing two histograms

For a small project, I need to compare one image with another - to determine if the images are approximately the same or not. The images are smallish, varying from 25 to 100px across. The images are meant to be of the same picture data but are…
Dai
  • 141,631
  • 28
  • 261
  • 374
50
votes
9 answers

python histogram one-liner

There are many ways to write a Python program that computes a histogram. By histogram, I mean a function that counts the occurrence of objects in an iterable and outputs the counts in a dictionary. For example: >>> L = 'abracadabra' >>>…
mykhal
  • 19,175
  • 11
  • 72
  • 80
49
votes
3 answers

Getting frequency values from histogram in R

I know how to draw histograms or other frequency/percentage related tables. But now I want to know, how can I get those frequency values in a table to use after the fact. I have a massive dataset, now I draw a histogram with a set binwidth. I want…
MiMi
  • 548
  • 1
  • 5
  • 8
48
votes
6 answers

Plotting a histogram from pre-counted data in Matplotlib

I'd like to use Matplotlib to plot a histogram over data that's been pre-counted. For example, say I have the raw data data = [1, 2, 2, 3, 4, 5, 5, 5, 5, 6, 10] Given this data, I can use pylab.hist(data, bins=[...]) to plot a histogram. In my…
Josh Rosen
  • 13,511
  • 6
  • 58
  • 70
47
votes
7 answers

How to normalize a histogram in MATLAB?

How to normalize a histogram such that the area under the probability density function is equal to 1?
edgarmtze
  • 24,683
  • 80
  • 235
  • 386
47
votes
8 answers

Multiple histograms in Pandas

I would like to create the following histogram (see image below) taken from the book "Think Stats". However, I cannot get them on the same plot. Each DataFrame takes its own subplot. I have the following code: import nsfg import matplotlib.pyplot…
Rohit
  • 5,840
  • 13
  • 42
  • 65
45
votes
4 answers

Plotting transparent histogram with non transparent edge

I am plotting a histogram, and I have three datasets which I want to plot together, each one with different colours and linetype (dashed, dotted, etc). I am also giving some transparency, in order to see the overlapping bars. The point is that I…
Argentina
  • 1,071
  • 5
  • 16
  • 30
45
votes
4 answers

Plot histogram with colors taken from colormap

I want to plot a simple 1D histogram where the bars should follow the color-coding of a given colormap. Here's an MWE: import numpy as n import matplotlib.pyplot as plt # Random gaussian data. Ntotal = 1000 data = 0.05 * n.random.randn(Ntotal) +…
Gabriel
  • 40,504
  • 73
  • 230
  • 404
44
votes
9 answers

How to make a histogram from a list of strings

I have a list of strings: a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e'] I want to make a histogram for displaying the frequency distribution of the letters. I can make a list that contains the count of each letter…
Gray
  • 481
  • 1
  • 4
  • 9
44
votes
2 answers

Circular / polar histogram in python

I have periodic data and the distribution for it is best visualised around a circle. Now the question is how can I do this visualisation using matplotlib? If not, can it be done easily in Python? Here I generate some sample data which I would like…
Cupitor
  • 11,007
  • 19
  • 65
  • 91