Questions tagged [histogram]

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data.

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size.

Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram used for probability density is always normalized to 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

In scientific software for statistical computing and graphics, The function hist generates a histogram. It can also optionally scale it so that its total area is 1. This puts it in the right scale if one want to overlay a probability density curve.

More about it here : histogram wiki

6663 questions
13
votes
2 answers

Create a bin for anything above X value in GGPlot2 Histogram

Using ggplot2, I want to create a histogram where anything above X is grouped into the final bin. For example, if most of my distribution was between 100 and 200, and I wanted to bin by 10, I would want anything above 200 to be binned in "200+". …
mikebmassey
  • 8,354
  • 26
  • 70
  • 95
12
votes
3 answers

histogram without vertical lines in Mathematica

I am trying to make an histogram without vertical lines. I'd like to have a plot which looks like a function. Like this: The same question has been asked for R before ( histogram without vertical lines ) but I'm on Mathematica. I have been looking…
tos
  • 996
  • 1
  • 10
  • 20
12
votes
2 answers

Faster way to extract histogram from an image

I'm looking for a faster way to extract histogram data from an image. I'm currently using this piece of code that needs about 1200ms for a 6mpx JPEG image: ImageReader imageReader = (ImageReader) iter.next(); …
myro
  • 1,158
  • 2
  • 25
  • 44
12
votes
1 answer

Controlling Bin Widths in Altair

I have a set of numbers that I'd like to plot on a histogram. Say: import numpy as np import matplotlib.pyplot as plt my_numbers = np.random.normal(size = 1000) plt.hist(my_numbers) If I want to control the size and range of the bins I could do…
stephan
  • 356
  • 3
  • 9
12
votes
3 answers

R: plot histogram of all columns in a data.frame

I'm a new user in R and I've just started to work with it to see the distribution of my data but I got stuck on this error. I have a data frame and I would like to plot histograms of it's numeric columns. So what I did is as bellow : num_data…
Alex
  • 1,914
  • 6
  • 26
  • 47
12
votes
1 answer

Plotting normal curve over histogram using ggplot2: Code produces straight line at 0

this forum already helped me a lot for producing the code, which I expected to return a histogram of a specific variable overlayed with its empirical normal curve. I used ggplot2 and stat_function to write the code. Unfortunately, the code produced…
Jannik
  • 123
  • 1
  • 1
  • 6
12
votes
1 answer

Tricks to get reverse-order cumulative histogram in matplotlib

I am wondering if there is a (better) trick to reverse a cumulative histogram in matplotlib. Let's say I have some scores in the range of 0.0 to 1.0 where 1.0 is the best score. Now, I am interested to plot how many samples are above a certain score…
user2489252
12
votes
2 answers

R - emulate the default behavior of hist() with ggplot2 for bin width

I'm trying to plot an histogram for one variable with ggplot2. Unfortunately, the default binwidth of ggplot2 leaves something to be desired: I've tried to play with binwidth, but I am unable to get rid of that ugly "empty" bin: Amusingly (to me),…
greymatter0
  • 223
  • 2
  • 4
12
votes
5 answers

Scala simple histogram

For a given Array[Double], for instance val a = Array.tabulate(100){ _ => Random.nextDouble * 10 } what is a simple approach to calculate a histogram with n bins ?
elm
  • 20,117
  • 14
  • 67
  • 113
12
votes
1 answer

Normalizing faceted histograms separately in ggplot2

My questions is similar to Normalizing y-axis in histograms in R ggplot to proportion but I'd like to add to it a bit. In general, I have 6 histograms in a 2x3 facet design, and I'd like to normalize each of them separately. I'll try to make a…
user1195564
  • 309
  • 1
  • 4
  • 11
12
votes
4 answers

Python: how to make an histogram with equally *sized* bins

I have a set of data, and want to make an histogram of it. I need the bins to have the same size, by which I mean that they must contain the same number of objects, rather than the more common (numpy.histogram) problem of having equally spaced…
astabada
  • 1,029
  • 4
  • 13
  • 26
12
votes
5 answers

how to generate bins for histogram using apache math 3.0 in java?

I have been looking for away to generate bins for specific dataset (by specifying lower band, upper band and number of bins required) using apache common math 3.0. I have looked at Frequency…
Sami
  • 7,797
  • 18
  • 45
  • 69
11
votes
3 answers

Creating a 3D histogram with R

How can I create a 3D histogram with R? For example, I have two variables to be counted for the number of times they fall in a defined two dimensional bin. So I have two variables in the X and Y axis, while the Z axis is the count of the two…
emanuele
  • 2,519
  • 8
  • 38
  • 56
11
votes
6 answers

How do I generate points that match a histogram?

I am working on a simulation system. I will soon have experimental data (histograms) for the real-world distribution of values for several simulation inputs. When the simulation runs, I would like to be able to produce random values that match…
AShelly
  • 34,686
  • 15
  • 91
  • 152
11
votes
3 answers

Creating histograms in bash

EDIT I read the question that this is supposed to be a duplicate of (this one). I don't agree. In that question the aim is to get the frequencies of individual numbers in the column. However if I apply that solution to my problem, I'm still left…
Chem-man17
  • 1,700
  • 1
  • 12
  • 27