Need help fixing my K-means clustering on MRI-data Python script

Question

I'm using KMeans clustering from the scikitlearn module, and nibabel to load and save nifti files.

I want to:

Load a nifti file
Perform KMeans clustering on the data of this nifti file (acquired by using the .get_fdata() function)
Take the labels acquire from clustering and overwrite the data's original intensity values with the label values
Save the new nifti and have it be the exact same shape as the original nifti so that they can be overlayed.

I've been struggling with this problem for months now, the following code results in far too few clusters (namely 8 instead of the 100 I'd like), and more importantly there's something wrong with the resulting nifti file, I'm getting a 3x3 grid of images instead of a single image.

Here's what that looks like: 3x3 image grid

And this is the code I'm using:

ztkmeans = kmeansnifti.get_fdata()
ztk2d = ztkmeans.reshape(-1, 3)
n_clusters = 100
to_kmeans = km(
    # Method for initialization, default is k-means++, other option is 'random', learn more at scikit-learn.org
    init='k-means++',
    # Number of clusters to be generated, int, default=8
    n_clusters=n_clusters,
    # n_init is the number of times the k-means algorithm will be ran with different centroid seeds, int, default=10
    n_init=1,
    # maximum iterations, int, default=300
    max_iter=1,
    # relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive
    # iterations to declare convergence, float, default=0.0001
    tol=0.0001,
    # verbosity, int, default=0
    verbose=1,
    # random state instance or None, default=None
    random_state=None,
    # copy_x, bool, default=True, makes sure original data is not overwritten
    copy_x=True,
    # algorithm, dict, {lloyd, elkan}, default="lloyd", classic EM-style algorithm is lloyd, elkan can be more efficient on datasets with well-defined clusters
    algorithm='lloyd')
kmeans = to_kmeans.fit(ztk2d)
labels = to_kmeans.labels_
labels3d = labels.reshape(-1, 290, 292)
kmeans_img = nibabel.Nifti1Image(labels3d, kmeansnifti.affine, kmeansnifti.header)
print('Saving kmeans_img as' + cruise+'/' + Patient[i]+'/' + ereg[b]+'/' + 'kmeans.nii.gz')
nibabel.save(kmeans_img, cruise+'/' + Patient[i]+'/' + ereg[b]+'/' + 'kmeans.nii.gz')
print('Done')

I'll happily try anything that you suggest, I've been running into this wall for a while now so I hope that I'm either missing something completely obvious, or that set of fresh eyes will help me finally figure this out.

max_iter=1, how can the kmeans converge in such a small iteration ? — montardon, Apr 14 '23 at 09:44
Putting the max iterations to 1 is just for the sake of speed while I'm troubleshooting my code, I'm not expecting the clustering to be accurate right now. — Björn, Apr 14 '23 at 09:48
@Björn Please reduce your question to a [MWE](https://stackoverflow.com/help/minimal-reproducible-example), "*I want to: 1, 2, 3*" you can [edit](https://stackoverflow.com/posts/76013369/edit) that to describe only the problem. — Bilal, Apr 26 '23 at 09:19

Need help fixing my K-means clustering on MRI-data Python script

0 Answers0