Questions tagged [histogram]

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data.

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size.

Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram used for probability density is always normalized to 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

In scientific software for statistical computing and graphics, The function hist generates a histogram. It can also optionally scale it so that its total area is 1. This puts it in the right scale if one want to overlay a probability density curve.

More about it here : histogram wiki

6663 questions
19
votes
6 answers

R generate 2D histogram from raw data

I have some raw data in 2D, x, y as given below. I want to generate a 2D histogram from the data. Typically, dividing the x,y values into bins of size 0.5, and count the number of occurrences in each bin (for both x and y at the same time). Is there…
Vahid Mirjalili
  • 6,211
  • 15
  • 57
  • 80
19
votes
6 answers

Force R to plot histogram as probability (relative frequency)

I am having trouble plotting a histogram as a pdf (probability) I want the sum of all the pieces to equal an area of one so it's easier to compare across datasets. For some reason, whenever I specify the breaks (the default of 4 or whatever is…
SwimBikeRun
  • 4,192
  • 11
  • 49
  • 85
19
votes
5 answers

Exact number of bins in Histogram in R

I'm having trouble making a histogram in R. The problem is that I tell it to make 5 bins but it makes 4 and I tell to make 5 and it makes 8 of them. data <- c(5.28, 14.64, 37.25, 78.9, 44.92, 8.96, 19.22, 34.81, 33.89, 24.28, 6.5, 4.32, 2.77, 17.6,…
Eduardo
  • 433
  • 1
  • 4
  • 10
18
votes
3 answers

How to keep a dynamical histogram?

is there a known algorithm + data-structure to maintain a dynamical histogram? Imagine I have a stream of data (x_1, w_1) , (x_2, w_2), ... where the x_t are doubles, that represent some measured variable and w_t is the associated weight. I could…
Rafael S. Calsaverini
  • 13,582
  • 19
  • 75
  • 132
18
votes
4 answers

Stacked histogram of grouped values in Pandas

i am trying to create a stacked histogram of grouped values using this code: titanic.groupby('Survived').Age.hist(stacked=True) But I am getting this histogram without stacked bars. How can i get the histogram's bar stacked without having to use…
leokury
  • 429
  • 1
  • 4
  • 15
18
votes
3 answers

How can I plot a histogram of a long-tailed data using R?

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000)…
David B
  • 29,258
  • 50
  • 133
  • 186
18
votes
5 answers

How to show histogram of RGB image in Matlab?

I read an image in matlab using input = imread ('sample.jpeg'); Then I do imhist(input); It gives this error: ??? Error using ==> iptcheckinput Function IMHIST expected its first input, I or X, to be two-dimensional. Error in ==>…
E_learner
  • 3,512
  • 14
  • 57
  • 88
17
votes
5 answers

Binning of data along one axis in numpy

I have a large two dimensional array arr which I would like to bin over the second axis using numpy. Because np.histogram flattens the array I'm currently using a for loop: import numpy as np arr = np.random.randn(100, 100) nbins = 10 binned =…
obachtos
  • 977
  • 1
  • 12
  • 30
17
votes
4 answers

How to create the histogram of an array with masked values, in Numpy?

In Numpy 1.4.1, what is the simplest or most efficient way of calculating the histogram of a masked array? numpy.histogram and pyplot.hist do count the masked elements, by default! The only simple solution I can think of right now involves creating…
Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
17
votes
3 answers

Add KDE on to a histogram

I would like to add a density plot to my histogram diagram. I know something about pdf function but I've got confused and other similar questions were not helpful. from scipy.stats import * from numpy import* from matplotlib.pyplot import* from…
aaa
  • 161
  • 1
  • 1
  • 8
17
votes
3 answers

Vertical Histogram in Python and Matplotlib

How can I make a vertical histogram. Is there any option for that or should it be built from the scratch? What I want is the upper graph to look like the below one but on vertical axis! from matplotlib import pyplot as plt import numpy as…
Cupitor
  • 11,007
  • 19
  • 65
  • 91
17
votes
4 answers

Partially color histogram in R

I have a histogram as shown in the picture. I want the bars in the two regions to be coloured red and green respectively, i.e., the bars from 0 to the first black border on the left should be coloured red and the bars in the second region should be…
darkage
  • 857
  • 3
  • 12
  • 22
17
votes
1 answer

gnuplot, break y-axis in two parts

I have a histogram with some small values and some very big values. How can I break the y-axis in two parts? EDIT: gnuplot sample: set style histogram columnstacked set style data histograms set key autotitle columnheader plot for [i=2:6]…
Jack Miller
  • 6,843
  • 3
  • 48
  • 66
17
votes
1 answer

How to adjust `binwidth` in ggplot2?

This may sound a like a repeat question, but hopefully it is not. In the basic R graphics histogram function, we have a option breaks="FD", which gives a reasonable sized binsize for the histogram, do we have any similar simple option for ggplot2? …
Sam
  • 7,922
  • 16
  • 47
  • 62
17
votes
2 answers

Matplotlib histogram with errorbars

I have created a histogram with matplotlib using the pyplot.hist() function. I would like to add a Poison error square root of bin height (sqrt(binheight)) to the bars. How can I do this? The return tuple of .hist() includes return[2] -> a list of 1…
bioslime
  • 1,821
  • 6
  • 21
  • 27