1

I'm constructing a web site for an choir to automatically evaluate candidates before they are invited to an audition. I'm maintaining a database of audio snippets that the candidate should sing.

I'm trying to use Aurio code from here: https://github.com/protyposis/Aurio. But I cannot get it to properly compare audio tracks that are slower or in a different scale.

private int findMatchesWang(String file1, String file2)
{
    // Setup the sources
    var audioTrack1 = new AudioTrack(new FileInfo(file1));
    var audioTrack2 = new AudioTrack(new FileInfo(file2));

    var profile = Aurio.Matching.Wang2003.FingerprintGenerator.GetProfiles()[0];
    var store = new Aurio.Matching.Wang2003.FingerprintStore(profile);
    var gen = new Aurio.Matching.Wang2003.FingerprintGenerator(profile);

    int hashCount = 0;

    gen.SubFingerprintsGenerated += delegate (object sender, SubFingerprintsGeneratedEventArgs e)
    {
        store.Add(e);
        hashCount += e.SubFingerprints.Count;
    };
    gen.Generate(audioTrack1);
    gen.Generate(audioTrack2);

    var matches = store.FindAllMatches();

    return matches.Count;
}

When file1 is identical to file2, the function returns the expected value (4686 matches). I tried calling the function with:

  • file2 is same song as file1, but played slower (see file "slow.wav").

  • file2 is same song as file1, but played in a different scale (see file "different-scale.wav").

In both cases, the function returns 0. Please find the files here: https://drive.google.com/drive/folders/10vKdc6C3InWpVs0g877Yub3ddv267GSZ?usp=sharing

Can anybody explain what's wrong?

Rina Sade
  • 11
  • 2

1 Answers1

0

The algorithms implemented in Aurio are not designed to match content with different speed or pitch/scale. They can only handle very very small changes in those dimensions, e.g. as they happen due to drifted sampling clocks between playback and recording devices (i.e. measurable, but not noticeable by human listeners).

(I am the author of the Aurio library)

Mario Gu
  • 505
  • 4
  • 13
  • Thank you, @Mario. Do you have any suggestion for a library that can match content with different speed or pitch? Any idea how Shazam does it? – Rina Sade Oct 31 '19 at 06:24
  • I'm not aware of any such libraries I'm afraid. I don't know how much pitch shift Shazam can handle, but a few percent (e.g. ~2%) of speed difference should be covered by their basic algorithm and larger changes could be e.g. covered with a simple "hack" by creating a few time stretched variations and indexing them too (same with pitch changes). Be aware that these algorithms are designed to detect "copies" of a recording, not different performances of one composition. – Mario Gu Nov 12 '19 at 18:48