0

In a nutshell, Shazam records a fingerprint of the song you're listening to, and sends it to its backend servers to match it against a fingerprint database. The lookup process then produces a histogram of offsets for each song in the index, and declares the song with most matches at a single offset to be the winner. Details about the algorithm can be found in the original paper here.

According to this blog post, Shazam split its index into tiers, in order to speed up the lookup process. The fingerprints of the most popular songs are stored in the first tier, which gets queried first. If no matching song is found in the first tier, the search then proceeds to the second tier, and so on and so forth.

What I don't get is how Shazam avoids false positives with such an architecture? E.g. how does it avoid matching a popular track with a high matching score when there is a less popular track with a higher matching score in a lower tier? Does it use a scoring function and a threshold? If yes, what would the scoring function look like?

Jean Gauthier
  • 265
  • 4
  • 9
  • 1
    If they do it to speed up lookup, then if the result is good enough they indeed stop searching. How they consider if it is good enough or not only shazam would know – juvian May 15 '18 at 19:01
  • 1
    as @juvian said ... when a high enough match is found it stops ... to find a match a high dimensional comparison is performed ... to reduce number of such comparisons it looks through popular tier first however it continues until a sufficient match is encountered ... evidently it would be too RAM intensive to keep all their precomputed song sample windows addressable ... keep in mind for each song they precompute windows (many audio samples) of varying widths where they synthesize a set of searchable patterns ... classic trade off of time for space ... quick response demands more memory space – Scott Stensland May 16 '18 at 10:29

0 Answers0