7

I am trying to make a contour plot of my 2d data. However, I would like to input the contours manually. I found the "levels" option in seaborn.kde documentation, where I can define the levels for contours manually. However, I have no idea what these levels mean. The documentation gives this definition -

Levels correspond to iso-proportions of the density.

What does iso-proportions of density mean? Are there any references that I could read up on this?

Machavity
  • 30,841
  • 27
  • 92
  • 100
MSB
  • 177
  • 2
  • 10
  • 1
    Note that when `levels` is set to a single number, it is supposed to be the number of contour lines (or areas in case `fill=True`). When levels is an array, each of the entries defines a contour line; these numbers should be between 0 and 1 (close to 0 meaning almost all samples will fit into the contour; close to 1 means only the most central samples will fit into the contour). An array with one element will output exactly one contour line. – JohanC Oct 15 '20 at 09:10

2 Answers2

4

The level here describes the cumulative mass below a given threshold. As described with an example in the documentation.

Number of contour levels or values to draw contours at. A vector argument must have increasing values in [0, 1]. Levels correspond to iso-proportions of the density: e.g., 20% of the probability mass will lie below the contour drawn for 0.2. Only relevant with bivariate data

You can describe levels in 2 ways -

  1. Specify the number of partitions you want in your probability mass function (levels = 5 makes 4 contour lines that partition the probability mass function into 5 parts)
  2. Explicitly mention the thresholds for each of the contours as a vector

The partitions mentioned here describe the area outside the contour plot. So, 0.2 means, 20% of the probability mass lies outside the first contour that represents 20%. Playing around with the following code makes this clearer.

I show both the implementations below for your reference.

import seaborn as sns
geyser = sns.load_dataset("geyser",)

#Levels as equal cuts in the probability mass function
sns.kdeplot(
    data=geyser, x="waiting", y="duration", hue="kind",
    levels=5
)

enter image description here

#Levels as explicitly described cuts in the probability mass function
sns.kdeplot(
    data=geyser, x="waiting", y="duration", hue="kind",
    levels=[0.3, 0.4, 0.8]
)

enter image description here

Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
  • 1
    I guess what confused me was the "probability mass will lie below the contour drawn" statement. In case of 2d contours it would have been more clear if it said; area outside of the contour (or as @mwaskom said, the integral over the area outside of the contour line). – MSB Oct 18 '20 at 15:55
  • Yea the examples I show to clarify that as well. Edited for more clarification. Also, another part of your question was to add your custom thresholds which are also answered in my answer. – Akshay Sehgal Oct 18 '20 at 17:50
2

Basically, the contour line for the level corresponding to 0.05 is drawn such that 5% of the distribution lies "below" it. Alternately, because the integral over the full density equals 1 (that's what makes it a PDF), the integral over the area outside of the contour line will be 0.05.

mwaskom
  • 46,693
  • 16
  • 125
  • 127