4

This is my code that obtain a file checksum:

import hashlib   
print hashlib.md5(open('file.mp3','rb').read()).hexdigest()

The result is a checksum that include metadata and file content. Is there a way to consider only file content ignoring metadata?

simotod
  • 41
  • 3
  • 1
    Do you mean it is considering the mp3 metadata? This seems specific to mp3 files: couldn't you use a library like eyeD3 or Mutagen to clear all metadata, so all files you compare have empty (or at least the same) metadata? – Herman Schaaf Dec 14 '15 at 14:44
  • Hi @HermanSchaaf, i can't delete metadata from mp3 because i need them – simotod Dec 14 '15 at 14:57
  • but you don't want them for the checksum; the only course of action is to remove the metadata temporarily (or isolate only the data), and then get the checksum of the file with no metadata. You can do it in a temporary file or even in memory, you don't need to replace the original file. – Herman Schaaf Dec 14 '15 at 15:33
  • 1
    Use the struct module to unpack the metadata and the data of the mp3 file, then just checksum the data – Netwave Dec 14 '15 at 15:34
  • 1
    The ID3 tags are (part of) the contents of the file. You will need to process the file contents and provide only the bytes from it that you are interested in to the md5 function. – dsh Dec 14 '15 at 15:35
  • maybe see also: http://stackoverflow.com/questions/13784993/how-do-i-uniquely-identify-the-content-of-a-media-file-in-python-not-the-metada – k-nut Dec 14 '15 at 15:36

1 Answers1

0

I've solved the issue downloading mp3hash library from https://pypi.python.org/pypi/mp3hash/.

from mp3hash import mp3hash
print mp3hash('file.mp3')
simotod
  • 41
  • 3