1

Ideally I want a way to uniquely identify files (audio files) where things like the ID3 tag or the file name can change and the hash remains the same.

Is there a better way I don't know of to uniquely identify files? Or would I have to change my record of the file every time an edit was made? Can I hash on other pieces of data or something?

Z2VvZ3Vp
  • 7,033
  • 6
  • 21
  • 35
  • changing single byte in file should change it hash (and should change drastic - google an avalanche effect). So clearly you did not change file content. – Łukasz Rogalski May 02 '15 at 18:20
  • Yeah you must have just missed my edit, I edited it because I didn't realize VLC makes you hit 'save metadata' to save the tag. Thanks. @ŁukaszR. – Z2VvZ3Vp May 02 '15 at 18:21

1 Answers1

2

If you only count files wich have EXACTLY the same audio information as "the same" (same bit depth, bitrate, compression, etc.) its pretty easy: you would just hash the "audio" part of the file. im not too familiar with the MPEG audio codec myself though.

reading material: http://en.wikipedia.org/wiki/MP3 http://mpgedit.org/mpgedit/mpeg_format/mpeghdr.htm

Max Kamps
  • 39
  • 4
  • 1
    more links: [link](http://blog.bjrn.se/2008/10/lets-build-mp3-decoder.html)[link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.138.7547&rep=rep1&type=pdf) – Max Kamps May 02 '15 at 18:42
  • Thanks for the answer. That does seem like it would work, but I don't think I can do that sort of thing with the tools I'm trying to use (HTML5 chrome app). I'm starting to think I could probably combine the size of the file (ignoring the last 2 or 3 digits) with the duration and something else and hash that? – Z2VvZ3Vp May 02 '15 at 18:43
  • @Maximillian ooo thanks for those additional links, that'll keep me reading for a while! – Z2VvZ3Vp May 02 '15 at 18:45
  • @PixMach Hey, if you finish your project be sure to post it here i would love to see it. Im just too frontend to understand file structures myself. – Max Kamps May 02 '15 at 19:00
  • @Maximillian I will. It will be some time off, but I'll remember this thread and revisit it. It's pretty ambitious actually so will take some time. – Z2VvZ3Vp May 02 '15 at 19:10
  • From a little experimentation it looks like I can decode the audio data to get 3 properties, one is duration that looks like this: 332.50065759637187 looks like that would be a pretty good indication of uniqueness if I could combine it with another field. It also gives a property called length, but I'm not sure what that is, if it can just be derived from duration. – Z2VvZ3Vp May 02 '15 at 19:42
  • Edit: Also can use analyzer to get some interesting data like minDecibels and maxDecibels, almost there maybe. – Z2VvZ3Vp May 02 '15 at 19:56
  • Easy in concept but in execution not so much! lol. – Casey Oct 24 '20 at 00:47