1

I am working on the following project and am exploring the CurveRep() clustering approach provided by Hmisc. (CurveRep clusters individual subjects' longitudinal growth curves according to similar patterns based on the CLARA clustering algorithm). As I haven't found any publication using CurveRep() and generally very little discussion about it on the internet, I would be grateful if you could let me know your experience with it or what you think about it! - My project: I have about 200 metabolites measured in n=500 subjects at three time points (0,30,120min). Individual time courses vary quite a bit, but in Spaghetti plots, there appear to be groups (e.g. straight & flat curves, peak-shaped curves, valley-curves). I would like to cluster these curves into two or three representative time courses and would then fit a curve-specific regression model for each cluster. CurveRep() seems exactly what I am looking for and it produces acceptable cluster solutions (although solutions are more based on different y-axis intersections rather than different growth patterns).

Is it any good? Are there alternative clustering algorithms that group according to similar longitudinal change (e.g., cluster 1 = "linear rising", cluster 2 = "valley-shaped")? Thanks a lot! Chris

1 Answers1

0

Three time points is too little for all the time-series methods to wpork for you. Look at DTW - it is designed for much higher resolution.

Clustering algorithms such as k-means, PAM and CLARA could work for you. Look at the cluster centers.

It may be necessary to preprocess your data more carefully.

If you are interested in change instead of absolute values, encode your data accordingly. For example,

x1, x2, x3 -> x2-x1, x3-x2

or

x1,x2,x3 -> x1-mu,x2-mu,x3-mu with mu=(x1+x2+x3)/3

this will make the clustering results more likely to match your motivation.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Thank you, @Anony-Mousse. Do I understand correctly: I should look at DTW (ie. understand it) but it is not appropriate as my three time points do not provide enough information about time curves? For each of the 500 subjects, I have three measurements of Metabolite intensity at 0/30/120min. I am not at all experienced in Cluster Analysis - to group people into meaningful clusters based on Metabolite time course, how would I do that. I.e., what would I base my clusters on? Thanks for helping! – Chris - Uppsala Apr 14 '15 at 06:20