Questions tagged [histogram]

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data.

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size.

Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram used for probability density is always normalized to 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

In scientific software for statistical computing and graphics, The function hist generates a histogram. It can also optionally scale it so that its total area is 1. This puts it in the right scale if one want to overlay a probability density curve.

More about it here : histogram wiki

6663 questions
26
votes
3 answers

Pandas bar plot with binned range

Is there a way to create a bar plot from continuous data binned into predefined intervals? For example, In[1]: df Out[1]: 0 0.729630 1 0.699620 2 0.710526 3 0.000000 4 0.831325 5 0.945312 6 0.665428 7 …
Arnold Klein
  • 2,956
  • 10
  • 31
  • 60
26
votes
1 answer

Set number of bins for histogram directly in ggplot

I'd like to feed geom_histogram the number of bins for my histogram instead of controlling bins through binwidth. The documentation says I can do this by setting the bins argument. But when I run ggplot(data = iris, aes(x = Sepal.Length)) +…
Empiromancer
  • 3,778
  • 1
  • 22
  • 53
26
votes
2 answers

How do I draw an arrow on a histogram drawn using ggplot2?

Here is dataset: set.seed(123) myd <- data.frame (class = rep(1:4, each = 100), yvar = rnorm(400, 50,30)) require(ggplot2) m <- ggplot(myd, aes(x = yvar)) p <- m + geom_histogram(colour = "grey40", fill = "grey40", binwidth =…
jon
  • 11,186
  • 19
  • 80
  • 132
25
votes
4 answers

Combination Boxplot and Histogram using ggplot2

I am trying to combine a histogram and boxplot for visualizing a continuous variable. Here is the code I have so far require(ggplot2) require(gridExtra) p1 = qplot(x = 1, y = mpg, data = mtcars, xlab = "", geom = 'boxplot') + coord_flip() p2 =…
Ramnath
  • 54,439
  • 16
  • 125
  • 152
25
votes
2 answers

Matplotlib histogram from numpy histogram output

I have run numpy.histogram() on a bunch of subsets of a larger datasets. I want to separate the calculations from the graphical output, so I would prefer not to call matplotlib.pyplot.hist() on the data itself. In principle, both of these functions…
Andrew Jaffe
  • 26,554
  • 4
  • 50
  • 59
25
votes
6 answers

Difference between contrast stretching and histogram equalization

I would like to know the difference between contrast stretching and histogram equalization. I have tried both using OpenCV and observed the results, but I still have not understood the main differences between the two techniques. Insights would be…
Jeru Luke
  • 20,118
  • 13
  • 80
  • 87
25
votes
2 answers

How to create a histogram from a flat Array in Ruby

How do I create a histogram of an array of integers? For example: data = [0,1,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,5,6,6,6,7,7,7,7,7,8,9,9,10] I want to create a histogram based on how many entries there are for 0, 1, 2, and so on. Is there an easy way…
Whitecat
  • 3,882
  • 7
  • 48
  • 78
24
votes
3 answers

Searching for a fast/efficient histogram algorithm (with pre-specified bins)

I don't do much coding outside of Matlab, but I have a need to export my Matlab code to another language, most likely C. My Matlab code includes a histogram function, histc(), that places my input data (which is double-precision, not integer) into a…
ggkmath
  • 4,188
  • 23
  • 72
  • 129
24
votes
2 answers

Matplotlib histogram with multiple legend entries

I have this code that produces a histogram, identifying three types of fields; "Low", "medium" , and "high": import pylab as plt import pandas as pd df = pd.read_csv('April2017NEW.csv', index_col =1) df1 = df.loc['Output Energy, (Wh/h)'] # choose…
warrenfitzhenry
  • 2,209
  • 8
  • 34
  • 56
24
votes
4 answers

Are there functions to retrieve the histogram counts of a Series in pandas?

There is a method to plot Series histograms, but is there a function to retrieve the histogram counts to do further calculations on top of it? I keep using numpy's functions to do this and converting the result to a DataFrame or Series when I need…
Rafael S. Calsaverini
  • 13,582
  • 19
  • 75
  • 132
23
votes
5 answers

How to align the bars of a histogram with the x axis?

Consider this simple example library(ggplot2) dat <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15)) ggplot(dat, aes(x = number)) + geom_histogram() See how the bars are weirdly aligned with the x axis? Why is the first bar on the left of 5.0…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
23
votes
2 answers

R- split histogram according to factor level

This is my data: type<-rep(c(0,1),100) diff<-rnorm(100) data<-data.frame(type,diff) If I want to plot historgram of diff, I do this: hist(data$diff) But what I want to do to split my histogram according to type. I could do…
89_Simple
  • 3,393
  • 3
  • 39
  • 94
22
votes
2 answers

How to make a log log histogram in python

Given an an array of values, I want to plot a log log histogram of these values by their counts. I only know how to log the x values, but not the y values because they are not explicitly created in my program.
user984923
  • 221
  • 1
  • 2
  • 4
22
votes
6 answers

Create unique colors using javascript

What is the best way to pick random colors for a bar chart / histogram such that each color is different from the other.. and possibly in contrast The most talked about way is '#'+(Math.random()*0xFFFFFF<<0).toString(16); but this can generate…
Kartik Dinesh
  • 253
  • 1
  • 2
  • 6
22
votes
2 answers

How to set color in matplotlib histograms

I am plotting a histogram using Matplotlib. I would like the color of the histogram to be "sky blue". But the data overlaps, and produces a histogram which is nearly black in color. plt.hist(data, color = "skyblue") Below is how the histogram…
user58925
  • 1,537
  • 5
  • 19
  • 28