PRML: how to plot the minimum misclassification-rate decision boundary?

Question

Two classes dataset

This is the synthetic classification data set with data from the two classes shown in red and blue. The blue class is generated from a single Gaussian while the red class comes from a mixture of two Gaussians.

Since we have the prior probabilities (p(C0)=0.5 and p(C1)=0.5) and the class-conditional probabilities (a single Gaussian p(x|C0) and a mixture of two Gaussians p(x|C1)), we can calculate the true posterior probabilities and plot the contour lines and filled contours as shown on the right. But how to plot the minimum misclassification-rate decision boundary (the green line)?

The data is generated as :

import numpy as np
import matplotlib.pyplot as plt

def create_toy_data(mu1, mu2, mu3, sigma1, sigma2, sigma3):
    x0 = np.random.multivariate_normal(mu1, sigma1, 100)
    x1 = np.random.multivariate_normal(mu2, sigma2, 50)
    x2 = np.random.multivariate_normal(mu3, sigma3, 50)
    return np.concatenate([x0, x1, x2]), np.concatenate([np.zeros(100, dtype='int'), np.ones(100, dtype='int')])

I know the minimum misclassification-rate decision boundary is p(C0|x)=p(C1|x)=0.5, but how to represent the curve explicitly?

Are you looking for the functional form of that specific decision boundary, or how to get an approximation by building a machine learning model? — Dimosthenis, Oct 31 '18 at 19:08
In a general sense, it appears what you want is to plot the implicit function p(C1|x) = 0.5 (or equivalently p(C0|x) = 0.5). Given the location and shape parameters for the Gaussian blobs, you can construct a function which returns p(C1|x) for any x = (x1, x2) where x1, x2 are the two dimensions of the input space. You would want to plot the implicit function p(C1|(x1, x2)) = 0.5 over the input space. A brief web search suggests Matplotlib isn't best for that; someone suggested Sympy (http://sympy.org). Good luck and have fun. — Robert Dodier, Oct 31 '18 at 22:43
@Dimosthenis The former. Is it possible to plot such decision boundary through a explicit function when all the related probabilities are known? — Charles, Oct 31 '18 at 23:54
@RobertDodier You got it. I'd also like to know whether the implicit function can be represented explicitly or not. — Charles, Nov 01 '18 at 00:00
When there's just one Gaussian bump for each class, the decision boundary is a conic section. With more than one bump per class, I don't think there is any simple characterization. — Robert Dodier, Nov 01 '18 at 00:22

PRML: how to plot the minimum misclassification-rate decision boundary?

0 Answers0