1

sklearn make_blobs() function can be used to Generate isotropic Gaussian blobs for clustering.

I am trying to plot the data generated by make_blobs() function.

import numpy as np
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

arr, blob_labels = make_blobs(n_samples=1000, n_features=1, 
                                centers=1, random_state=1)
a = plt.hist(arr, bins=np.arange(int(np.min(arr))-1,int(np.max(arr))+1,0.5), width = 0.3)

this piece of code gives a normal distribution plot, which makes sense.

enter image description here

blobs, blob_labels = make_blobs(n_samples=1000, n_features=2, 
                                centers=2, random_state=1)

a = plt.scatter(blobs[:, 0], blobs[:, 1], c=blob_labels)

this piece of code gives a 2-clusters plot, which also makes sense.

enter image description here

I am wondering that is there a way to plot the data generated by make_blobs() function with params centers=2 n_features=1.

arr, blob_labels = make_blobs(n_samples=1000, n_features=1, 
                                centers=2, random_state=1)

I've tried plt.hist(), which gives another normal distribution plot.

I have no idea how to use plt.scatter() with the data.

I cannot image what the plot should look like.

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

0

Your issue is somewhat unclear.

I've tried plt.hist(), which gives another normal distribution plot.

Well, not exactly; it gives a bimodal Gaussian mixture plot:

arr, blob_labels = make_blobs(n_samples=1000, n_features=1, 
                                centers=2, random_state=1)

a = plt.hist(arr, bins=np.arange(int(np.min(arr))-1,int(np.max(arr))+1,0.5), width = 0.3)

enter image description here

as expected, since now we have centers=2.

I have no idea how to use plt.scatter() with the data.

By definition, a scatter plot needs 2D data; from the docs:

A scatter plot of y vs x with varying marker size and/or color.

while here, due to n_features=1, we actually have only x and no y.

A 1D "scatter plot" is actually a line, and we can use plot to visualize it, as nicely explained in How to plot 1-d data at given y-value with pylab; in your case:

val = 0. # this is the value where you want the data to appear on the y-axis.
a = plt.plot(arr, np.zeros_like(arr) + val, 'x')

enter image description here

where of course we should keep in mind that the vertical axis is just a convenience for the visualization, and does not say anything for our data which have no y value whatsoever.

Want to use different colors and/or markers for each center?

val = 0. # this is the value where you want the data to appear on the y-axis.
plt.plot(arr[blob_labels==0], np.zeros_like(arr[blob_labels==0]) + val, 'x', color='y')
plt.plot(arr[blob_labels==1], np.zeros_like(arr[blob_labels==1]) + val, '+', color='b')
plt.show()

enter image description here

where for larger samples the situation starts getting somewhat more interesting; notice the overlap for n_samples=10000:

enter image description here

desertnaut
  • 57,590
  • 26
  • 140
  • 166