Recommended Algorithim for time based clustering

Question

I am not very knowledgeable on time based clustering and wondering if any algorithms are well suited for my use case.

I have a set of exertion data (range from 0-500) and I want to cluster them along time intervals.

My problem is that I want to find point the points of time where there is major exertion differences on the time interval. I will know exactly how many grouping their should be (e.g. 5 separate clusters) but wont know where one ends and the next one starts.

Is there a good algorithm to apply in this case? I was looking at K-Means but it appears to be very good at clustering disregarding the time and I am more looking for the boundaries looking at exertion data.

score 1 · Answer 1 · answered Nov 12 '18 at 13:01

I think you could get good results from a dynamic program. For each interval [i, j), let C(i, j) be a loss function that is lower when the interval values are more likely to be one cluster. Then letting L(k, r) be the minimum loss for up to k clusters of elements [0, r), we have equations

L(1, r) = C(0, r)
L(k, r), k > 1 = min over s in [0, r) of L(k-1, s) + C(s, r).

If there are O(1) values of k needed, evaluating these equations with memoization takes O(n^2) time and O(n) space where n is the number of samples.

A plausible first choice for C(i, j) would be the statistical variance of the samples in that interval. Naively, this requires Theta(n^3) time to compute for each interval, but Welford's algorithm can be used to compute variance online if you iterate s from its greatest value to its least, so the overall algorithm would still be O(n^2).

Recommended Algorithim for time based clustering

1 Answers1