Histogram

A histogram is a visual representation of the distribution of numeric data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to "bin" (or "bucket") the range of values— divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent and are often (but not required to be) of equal size.

Bins are typically of equal width, but unequal bin sizes are sometimes used.

Histograms give a rough sense of the density of the underlying distribution of the data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram used for probability density is always normalized to 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

Histograms are sometimes confused with bar charts. A histogram is used for quantitative data, where the bins represent ranges of values, while a bar chart is a plot of categorical variables. Some authors recommend that bar charts have gaps between the rectangles to clarify the distinction.

A bar graph and a histogram are two common types of graphical representations of data. While they may look similar, there are some key differences between the two that are important to understand.

A bar graph is a chart that uses bars to represent the frequency or quantity of different categories of data. The bars can be either vertical or horizontal, and they are typically arranged either horizontally or vertically to make it easy to compare the different categories. Bar graphs are useful for displaying data that can be divided into discrete categories, such as the number of students in different grade levels at a school.

A histogram, on the other hand, is a graph that shows the distribution of numerical data. It is a type of bar chart that shows the frequency or number of observations within different numerical ranges, called bins. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The histogram provides a visual representation of the distribution of the data, showing the number of observations that fall within each bin. This can be useful for identifying patterns and trends in the data, and for making comparisons between different datasets.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.