1

I've been working on a project that involves the clustering of data with periodic boundary conditions. So, I am looking for clustering algorithms that can effectively handle datasets where periodicity plays a significant role.

My data is 3D and I am interested if DBSCAN or HDBSACN can implemnt the periodic boundary conditiin for this case. I found in literature K-means has a way of doing it. https://doi.org/10.3390/sym14061237

Thank you in advance for your valuable input!

ON HDBSCAN, I do it this way

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import hdbscan
import seaborn as sns

def fit_and_visualize_clusters(filename, atom_indices):
    
    atoms = read(filename)

    # Get the positions of the specified atoms
    data = [atoms.positions[i] for i in atom_indices]
    
    X = [i[0] for i in data]
    Y = [i[1] for i in data]
    Z = [i[2] for i in data]

    # Create an instance of HDBSCAN
    clusterer = hdbscan.HDBSCAN(min_cluster_size=4, gen_min_span_tree=True)

    # Perform clustering
    cluster_labels = clusterer.fit_predict(data)
    print(cluster_labels)

For DBSCAN,

from sklearn.cluster import DBSCAN
import pandas as pd

# Convert the dataset to a DataFrame
DBSCAN_clustered = pd.DataFrame(X, columns=['X', 'Y', 'Z'])

# Perform DBSCAN clustering
DBS_clustering = DBSCAN(eps=6.25, min_samples=6).fit(DBSCAN_clustered)

# Assign cluster labels to the DataFrame
DBSCAN_clustered['Cluster'] = DBS_clustering.labels_
Saha_1994
  • 121
  • 3
  • 1
    You can do it by defining your own distance function (accounting for the periodicity of your dimensions) see https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html – Learning is a mess Jul 17 '23 at 21:48

0 Answers0