4

sorry, if the title is ambiguous. Let me explain the problem. By the way, I'm really new to Data Science, so sorry if I make a statement that doesn't make sense.

Recently came across to a problem which was related to clustering. The coordinates were given for a lot of points. The task was to cluster them. But it is not the type of clustering based on distance. In fact, those points belong to functions and they need to be clustered, accordingly.

This is not what my data looks like, but the problem is the same:

Please take a look here. In the given link, the provided problem is what I am looking for, but it is in R, not Python. When I searched for "functional clustering" in Python, I couldn't find anything. Please direct me in the correct path, if you know how to do it.

Mansur
  • 1,661
  • 3
  • 17
  • 41
  • 1
    If I understand correctly, you're trying to automatically find data for and perform linear regressions. I suspect that a workable approach will be to sample from the dataset such that you're looking at an average of 5 points per line, then iterate over the number of solutions, to minimise the sum of squared errors. You'd have to be pretty sure that your data are made up of a discrete number of linear correlations, rather than anything more complex than that though. – Andrew Dec 05 '18 at 12:59
  • Yes and no. You understood the question correctly, but probably provided image made you think that it is linear. But in fact, it is not. The relationship in my dataset is non-linear :/ – Mansur Dec 05 '18 at 13:06
  • The farther your suggested relationship deviates from non-linearity, the more likely you are to find specious clusters - I would proceed with caution with this approach – Andrew Dec 05 '18 at 13:11
  • Do you know what the relationship in the clusters is? Is it always log? Linear? Is the spread constant in the groups along a function - (This is probably always false for non linear)? If so, I'd dry DBSCAN as it respects spread and not just distance. Otherwise, if you know the function, I'd do reverse transformation on the data before clustering. DBSCAN: https://en.wikipedia.org/wiki/DBSCAN. DBSCAN in Python: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html – AChervony Jan 29 '19 at 20:18
  • There are several clustering techniques that capture the "local structure" of your dataset. As pointed above, DBSCAN is one popular one. A couple others you can check out are: Hierarchical Agglomorative Clustering with Single Linkage, and Spectral Clustering. Both are present in sklearn.cluster This link https://scikit-learn.org/stable/modules/clustering.html has comparison of how various clustering algos would cluster your data. – Dileep Kumar Patchigolla Dec 10 '19 at 11:51
  • You could just do it in R or translate func_cluster to python – Rafael Valero Feb 14 '21 at 14:06
  • @AChervony Can you elaborate what you mean by "do reverse transformation on the data before clustering via DBSCAN." Very informative answer! Let's say the function is known beforehand. What is the next step? – TunaFishLies Nov 20 '22 at 16:00

0 Answers0