8

Anyone know of an algorithm that will group pictures into events based on the date the picture was taken. Obviously I can group by the date, but I'd like something a little more sophisticated that would(might) be able to group pictures spanning multiple days based on the frequency over a certain timespan. Consider the following groupings:

  • 1/2/2009 15 photos
  • 1/3/2009 20 photos
  • 1/4/2009 13 photos
  • 1/5/2009 19 photos
  • 1/15/2009 5 photos

Potentially these would be grouped into two groups:

  1. 1/2/2009 -> 1/5/2009
  2. 1/15/2009

Obviously there will be some tolerance(s) that need to be established.

Is there any well established way of doing this, other then inventing my own top/down approach?

Greg Dean
  • 29,221
  • 14
  • 67
  • 78
  • Did you end up with a solution that worked well? If so, would you be able to share your approach? I'm about to work on a similar problem. – MahlerFive Aug 23 '13 at 21:04

5 Answers5

7

You can apply pretty much any standard clustering technique to this, it's just a matter of defining your distance function correctly. When you are making your matrix of distances between your photos you should consider a combination of physical distance between locations - if you have it - and temporal distance between their creation timestamps. Normalise them and put them on separate dimensions and you may even just be able to take a regular euclidean distance.

Best of luck.

Simon
  • 78,655
  • 25
  • 88
  • 118
0

Just group the pictures that were taken on successive days (no days on which no pictures were taken) together.

tehvan
  • 10,189
  • 5
  • 27
  • 31
0

You might try to dynamically calculate tolerance based on how many or how big (absolute or %) clusters you want to create.

vartec
  • 131,205
  • 36
  • 218
  • 244
0

To get a useful clustering of pictures according to date you require the following:

1) The number of clusters should be variable and not fixed a priori to the clustering

2) The diameter of each cluster should not exceed a specific amount.

The clustering algorithm that best satisfies both requirements is the QT (quality threshold) clustering algorithm. From Wikipedia:

QT (quality threshold) clustering (Heyer, Kruglyak, Yooseph, 1999) is an alternative method of partitioning data, invented for gene clustering. It requires more computing power than k-means, but does not require specifying the number of clusters a priori, and always returns the same result when run several times.

Although it is mainly used for gene clustering I think it would fit in very well for what you need.

Il-Bhima
  • 10,744
  • 1
  • 47
  • 51
  • Any hierarchical agglomeration technique shares that property. – Simon Mar 06 '09 at 08:48
  • Why do you think QT clustering is better? – Greg Dean Mar 06 '09 at 08:50
  • hierarchical agglomeration technique will naively merge always the closest two point/cluster pairs at each iteration. Since you are not considering all clusters for each point you could end up with skewed clusters – Il-Bhima Mar 06 '09 at 09:22
  • w/ QT wont the first cluster always be the size of the predefined max diameter? – Greg Dean Mar 06 '09 at 09:31
  • The first cluster is by definition the cluster having the most points within the given diameter. Every cluster will have the predefined max diameter if there are enough points. – Il-Bhima Mar 06 '09 at 09:43
  • Right, which makes it ill suited for this application, unless of course you assume each event is the same duration of time (same diameter). – Greg Dean Mar 07 '09 at 15:46
  • @Il-Bhima. Naive implementations will indeed do that, but it is an easy problem to avoid. – Simon Mar 09 '09 at 20:26
0

Try to detect the Gaps instead of the Clusters.

The Unknown
  • 19,224
  • 29
  • 77
  • 93