find unknown amount of density, cluster, groups of values (timestamps)

Question

I currently have this:

Data = [2003, 8, 4, 12, 30, 45, 2003, 8, 4, 12, 32, 55, ... 2003, 12, 9, 08, 30, 45]

(The amount of datetime items is about 50.000 up to a million or sometimes more.)

I would like to let my machine extract datetimes that groups into clusters, densities in the total date range. The datetimes are from trading activities, so almost all of the are during daytime, after 09:00 in the morning and until 22:00 in the night.

If there is no good way to let the machine decide, I could:

Give two parameters, set by the user: E.g Minimum_cluster_size = 5 # minimum datetimes inside cluster Maximum_cluster_datetime_range = 6000 # seconds between first and last date time in the cluster stack.

A nice output would be something like:

Clusters_found = {0: [2003,8,4,12,30,45, 
                  2003,8,4,12,31,20,
                  2003,8,4,12,33,22],
                  ...
             321:[2003,8,4,14,00,45, 
                  2003,8,4,14,01,20,
                  2003,8,4,14,03,22]} # a dict with 321 clusters.

I appreciate any suggestion, I am fairly novice and mostly use coding for normalizing table data or cartography.

what if there are only 4 timestamps within the 6000 maximum range? — Aprillion, May 13 '15 at 08:25
Why your further analyses abstract from a TimeZone context? Naive DateTime(s) are very dangerous. Globally ... — user3666197, May 13 '15 at 09:02

score 0 · Answer 1 · answered May 13 '15 at 08:30

0

I would convert the flat list into a list of tuples (or datetime objects), e.g.:

data = [tuple(data[i:i+5]) for i in range(0, len(data), 6)]

and look for existing packages such as cluster or sklearn.cluster to do the clustering...

answered May 13 '15 at 08:30

Aprillion

21,510
5
55
89

find unknown amount of density, cluster, groups of values (timestamps)

1 Answers1