I have a collection of media files, mostly music, most of them having been imported from CD many years ago. This collection has been transferred between different media players, different filesystems, different computers, etc, many times. In that process, some tracks have been accidentally duplicated. I'm also constantly trying to curate the metadata on these and get everything properly tagged, since when much of it was originally imported, I did not have fancy media playback software and did not even realize that the ID3 tags indicated that everything was just "Track %d" on the classic album "Album".
This creates a situation where I have some files with up-to-date metadata, but "duplicates" of the same media file that I'd like to delete, whose metadata has not been properly updated. Since the metadata is present within the file, the contents of these files now differ and tools like liten2 don't work.
My question is: is there a library I can use that will conveniently extract a uniquely identifying fingerprint (probably a cryptographic hash of some kind, but that's not a hard requirement) of the media content only of the file, ignoring the metadata? If so, how do I use it?