This is the synthetic classification data set with data from the two classes shown in red and blue. The blue class is generated from a single Gaussian while the red class comes from a mixture of two Gaussians.
Since we have the prior probabilities (p(C0)=0.5 and p(C1)=0.5) and the class-conditional probabilities (a single Gaussian p(x|C0) and a mixture of two Gaussians p(x|C1)), we can calculate the true posterior probabilities and plot the contour lines and filled contours as shown on the right. But how to plot the minimum misclassification-rate decision boundary (the green line)?
The data is generated as :
import numpy as np
import matplotlib.pyplot as plt
def create_toy_data(mu1, mu2, mu3, sigma1, sigma2, sigma3):
x0 = np.random.multivariate_normal(mu1, sigma1, 100)
x1 = np.random.multivariate_normal(mu2, sigma2, 50)
x2 = np.random.multivariate_normal(mu3, sigma3, 50)
return np.concatenate([x0, x1, x2]), np.concatenate([np.zeros(100, dtype='int'), np.ones(100, dtype='int')])
I know the minimum misclassification-rate decision boundary is p(C0|x)=p(C1|x)=0.5, but how to represent the curve explicitly?