Questions tagged [histogram]

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data.

In statistics, a histogram is a graphical representation, showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size.

Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram used for probability density is always normalized to 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

In scientific software for statistical computing and graphics, The function hist generates a histogram. It can also optionally scale it so that its total area is 1. This puts it in the right scale if one want to overlay a probability density curve.

More about it here : histogram wiki

6663 questions
44
votes
1 answer

Normalizing y-axis in histograms in R ggplot to proportion by group

My question is very similar to Normalizing y-axis in histograms in R ggplot to proportion, except that I have two groups of data of different size, and I would like that each proportion is relative to its group size instead of the total size. To…
Erwan
  • 1,385
  • 1
  • 12
  • 22
42
votes
6 answers

How to get the cumulative distribution function with NumPy?

I want to create a CDF with NumPy, my code is the next: histo = np.zeros(4096, dtype = np.int32) for x in range(0, width): for y in range(0, height): histo[data[x][y]] += 1 q = 0 cdf = list() for i in histo: q = q + i …
omar
  • 1,541
  • 3
  • 21
  • 36
41
votes
2 answers

Use hist() function in R to get percentages as opposed to raw frequencies

How can one plot the percentages as opposed to raw frequencies using the hist() function in R?
newdev14
  • 1,091
  • 4
  • 15
  • 25
40
votes
2 answers

How to add edge color to a histogram

While doing some practice problems using seaborn and a Jupyter notebook, I realized that the distplot() graphs did not have the darker outlines on the individual bins that all of the sample graphs in the documentation have. I tried creating the…
Colin Lindley
  • 403
  • 1
  • 4
  • 4
39
votes
3 answers

Histogram in matplotlib, time on x-Axis

I am new to matplotlib (1.3.1-2) and I cannot find a decent place to start. I want to plot the distribution of points over time in a histogram with matplotlib. Basically I want to plot the cumulative sum of the occurrence of a…
four-eyes
  • 10,740
  • 29
  • 111
  • 220
36
votes
2 answers

Matplotlib histogram with collection bin for high values

I have an array with values, and I want to create a histogram of it. I am mainly interested in the low end numbers, and want to collect every number above 300 in one bin. This bin should have the same width as all other (equally wide) bins. How can…
physicalattraction
  • 6,485
  • 10
  • 63
  • 122
36
votes
3 answers

How to make a histogram from a list of data and plot it with matplotlib

I've got matplotlib installed and try to create a histogram plot from some data: #!/usr/bin/python l = [] with open("testdata") as f: line = f.next() f.next() # skip headers nat = int(line.split()[0]) print nat for line in f: …
Wana_B3_Nerd
  • 613
  • 3
  • 7
  • 21
36
votes
4 answers

Creating a density histogram in ggplot2?

I want to create the next histogram density plot with ggplot2. In the "normal" way (base packages) is really easy: set.seed(46) vector <- rnorm(500) breaks <- quantile(vector,seq(0,1,by=0.1)) labels = 1:(length(breaks)-1) den =…
Usobi
  • 1,816
  • 4
  • 18
  • 25
35
votes
4 answers

Setting a relative frequency in a matplotlib histogram

I have data as a list of floats and I want to plot it as a histogram. Hist() function does the job perfectly for plotting the absolute histogram. However, I cannot figure out how to represent it in a relative frequency format - I would like to have…
user1278140
  • 353
  • 1
  • 3
  • 5
35
votes
7 answers

How to center labels in histogram plot

I have a numpy array results that looks like [ 0. 2. 0. 0. 0. 0. 3. 0. 0. 0. 0. 0. 0. 0. 0. 2. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 2. 0. 3. 1. 0. 0. 2. …
Simd
  • 19,447
  • 42
  • 136
  • 271
34
votes
2 answers

Why isn't this code to plot a histogram on a continuous value Pandas column working?

I am trying to create a histogram on a continuous value column Trip_distance in a large 1.4M row pandas dataframe. Wrote the following code: fig =…
Baktaawar
  • 7,086
  • 24
  • 81
  • 149
34
votes
4 answers

How can I visualize a histogram with Promdash or Grafana?

I'm attracted to prometheus by the histogram (and summaries) time-series, but I've been unsuccessful to display a histogram in either promdash or grafana. What I expect is to be able to show: a histogram at a point in time, e.g. the buckets on the…
TvE
  • 1,016
  • 1
  • 11
  • 19
34
votes
1 answer

ggplot() lines transparency

How to change the transparency level of lines in ggplot() diagram (i.e. histogram, line plot, etc.)? For instance consider the code below: data <- data.frame(a=rnorm(100), b = rnorm(100,.5,1.2)) data <- melt(data) colnames(data) <- c("Category",…
Ali
  • 9,440
  • 12
  • 62
  • 92
34
votes
5 answers

How to generate a frequency table in R with with cumulative frequency and relative frequency

I'm new with R. I need to generate a simple Frequency Table (as in books) with cumulative frequency and relative frequency. So I want to generate from some simple data like > x [1] 17 17 17 17 17 17 17 17 16 16 16 16 16 18 18 18 10 12 17 17 17 17…
eloyesp
  • 3,135
  • 1
  • 32
  • 47
33
votes
2 answers

How to plot a density map in python?

I have a .txt file containing the x,y values of regularly spaced points in a 2D map, the 3rd coordinate being the density at that point. 4.882812500000000E-004 4.882812500000000E-004 0.9072267 1.464843750000000E-003 4.882812500000000E-004 …
user3722235
  • 557
  • 2
  • 6
  • 12