Algorithm for clustering pictures based on date taken

Question

Anyone know of an algorithm that will group pictures into events based on the date the picture was taken. Obviously I can group by the date, but I'd like something a little more sophisticated that would(might) be able to group pictures spanning multiple days based on the frequency over a certain timespan. Consider the following groupings:

1/2/2009 15 photos
1/3/2009 20 photos
1/4/2009 13 photos
1/5/2009 19 photos
1/15/2009 5 photos

Potentially these would be grouped into two groups:

1/2/2009 -> 1/5/2009
1/15/2009

Obviously there will be some tolerance(s) that need to be established.

Is there any well established way of doing this, other then inventing my own top/down approach?

Did you end up with a solution that worked well? If so, would you be able to share your approach? I'm about to work on a similar problem. — MahlerFive, Aug 23 '13 at 21:04

score 7 · Accepted Answer · answered Mar 06 '09 at 08:31

You can apply pretty much any standard clustering technique to this, it's just a matter of defining your distance function correctly. When you are making your matrix of distances between your photos you should consider a combination of physical distance between locations - if you have it - and temporal distance between their creation timestamps. Normalise them and put them on separate dimensions and you may even just be able to take a regular euclidean distance.

Best of luck.

score 0 · Answer 2 · answered Mar 06 '09 at 08:30

0

Just group the pictures that were taken on successive days (no days on which no pictures were taken) together.

answered Mar 06 '09 at 08:30

tehvan

10,189
5
27
31

score 0 · Answer 3 · answered Mar 06 '09 at 08:40

0

You might try to dynamically calculate tolerance based on how many or how big (absolute or %) clusters you want to create.

answered Mar 06 '09 at 08:40

vartec

131,205
36
218
244

score 0 · Answer 4 · answered Mar 06 '09 at 08:45

0

To get a useful clustering of pictures according to date you require the following:

1) The number of clusters should be variable and not fixed a priori to the clustering

2) The diameter of each cluster should not exceed a specific amount.

The clustering algorithm that best satisfies both requirements is the QT (quality threshold) clustering algorithm. From Wikipedia:

QT (quality threshold) clustering (Heyer, Kruglyak, Yooseph, 1999) is an alternative method of partitioning data, invented for gene clustering. It requires more computing power than k-means, but does not require specifying the number of clusters a priori, and always returns the same result when run several times.

Although it is mainly used for gene clustering I think it would fit in very well for what you need.

answered Mar 06 '09 at 08:45

Il-Bhima

10,744
1
47
51

Any hierarchical agglomeration technique shares that property. – Simon Mar 06 '09 at 08:48
Why do you think QT clustering is better? – Greg Dean Mar 06 '09 at 08:50
hierarchical agglomeration technique will naively merge always the closest two point/cluster pairs at each iteration. Since you are not considering all clusters for each point you could end up with skewed clusters – Il-Bhima Mar 06 '09 at 09:22
w/ QT wont the first cluster always be the size of the predefined max diameter? – Greg Dean Mar 06 '09 at 09:31
The first cluster is by definition the cluster having the most points within the given diameter. Every cluster will have the predefined max diameter if there are enough points. – Il-Bhima Mar 06 '09 at 09:43
Right, which makes it ill suited for this application, unless of course you assume each event is the same duration of time (same diameter). – Greg Dean Mar 07 '09 at 15:46
@Il-Bhima. Naive implementations will indeed do that, but it is an easy problem to avoid. – Simon Mar 09 '09 at 20:26

score 0 · Answer 5 · answered Jun 05 '09 at 18:15

0

Try to detect the Gaps instead of the Clusters.

answered Jun 05 '09 at 18:15

The Unknown

19,224
29
77
93

Algorithm for clustering pictures based on date taken

5 Answers5