I currently have this:
Data = [2003, 8, 4, 12, 30, 45, 2003, 8, 4, 12, 32, 55, ... 2003, 12, 9, 08, 30, 45]
(The amount of datetime items is about 50.000 up to a million or sometimes more.)
I would like to let my machine extract datetimes that groups into clusters, densities in the total date range. The datetimes are from trading activities, so almost all of the are during daytime, after 09:00 in the morning and until 22:00 in the night.
If there is no good way to let the machine decide, I could:
Give two parameters, set by the user: E.g Minimum_cluster_size = 5 # minimum datetimes inside cluster Maximum_cluster_datetime_range = 6000 # seconds between first and last date time in the cluster stack.
A nice output would be something like:
Clusters_found = {0: [2003,8,4,12,30,45,
2003,8,4,12,31,20,
2003,8,4,12,33,22],
...
321:[2003,8,4,14,00,45,
2003,8,4,14,01,20,
2003,8,4,14,03,22]} # a dict with 321 clusters.
I appreciate any suggestion, I am fairly novice and mostly use coding for normalizing table data or cartography.