I'm building a small tool that I want to scan over a music collection, read the ID3 info of a track, and store it as long as that particular artist does not have a song that has been accessed more than twice. I'm planning on using Mutagen
for reading the tags.
However, the music collections of myself and many others are massive, exceeding 20,000 songs. As far as I know, libraries like Mutagen
have to open and close every song to get the ID3 info from it. While MP3s aren't terribly performance-heavy, that's a lot of songs. I'm already planning a minor optimization in the form of keeping a count of each artist and not storing any info if their song count exceeds 2, but as far as I can tell I still need to open every song to check the artist ID3 tag.
I toyed with the idea of using directories as a hint for the artist name and not reading any more info in that directory once the artist song count exceeds 2, but not everyone has their music set up in neat Artist/Album/Songs directories.
Does anyone have any other optimizations in mind that might cut down on the overhead of opening so many MP3s?