1

I've been facing following issue:

  • there is a directory, where online radio station is ripping and storing mp3 files

  • some if them is played and stored more times, difference is in the name : 1st file till dot has unique name : something.mp3 2nd file before dot has brackets : something(1).mp3 3rd file before dot has brackets : something(2).mp3 and so on.... I would like to delete the smaller files and only leave one of them. Therefor started following script :

    #!/usr/bin/python3

      import os
      import datetime
      import sys
      import glob
      from collections import Counter
    
      path = "/path_of_mp3_files/"
    
      dirs = os.listdir( path )
      list_of_files = []
      mp3files = []
      mp3_with_bracket = []
      mp3_without_bracket = []
      pending_files = []
    
    
      for file in dirs:
          if ")" not in file:
              mp3_without_bracket.append(file)
          else:
              mp3_with_bracket.append(file)
      print(mp3_with_bracket)
      print("-------------------------------------------------------------------------------------------------------------")
      print(mp3_without_bracket)
      mp3_without_bracket.sort()
      mp3_with_bracket.sort()
    

Logic behind is making 2 lists, with and without brackets. But what now? Could you give me some advice how to finish it? Anyway, is the logic good enough?

Jayvee
  • 10,670
  • 3
  • 29
  • 40
furumc
  • 13
  • 2
  • do the files have tags? – Jayvee Jan 06 '22 at 12:34
  • Unfortunately, they dont have tags : does not have an ID3 1.x tag. I use streamripper, maybe it could add it? – furumc Jan 06 '22 at 12:42
  • Logic looks good. Separating the duplicates (file names with brackets) from the original ones. Now, you need to delete the files from the dir if they match the file names in the list `mp3_with_no_bracket` by using `os.remove(filepath)` If you want to find the file size, then you may try this: `os.path.getsize(file)` – Anand Gautam Jan 06 '22 at 13:32
  • Thank you! It was helpful! That was I missed! :) – furumc Jan 07 '22 at 07:43

1 Answers1

0

You can do something like this:

import os
import datetime
import re
import sys
import glob
from collections import Counter

path = "mp3_files"

dirs = os.listdir( path )
to_remove=[]

for i in range(len(dirs)):
    for j in range (i+1,len(dirs)):
        # get names without (n)
        namei=re.sub('\(\d+\)','',dirs[i])
        namej=re.sub('\(\d+\)',"",dirs[j])
        # get sizes
        sizei=os.path.getsize(os.path.join(path, dirs[i]))
        sizej=os.path.getsize(os.path.join(path, dirs[j]))
        
        # if same name check size
        if namei==namej and sizei<sizej:
            to_remove.append(dirs[i])
        elif namei==namej and sizei>=sizej:
            to_remove.append(dirs[j])
            
for file in set(to_remove):
    # os.remove(os.path.join(path, file))
    print(os.path.join(path, file)

this will go through the list of files and check them against the rest, if the name without parentheses is the same and then add the smaller to the list of files to remove. Finally remove the files.

** please notice that I commented out the actual deletion and added a print instead so you can run it first and make sure it will delete what you want. Please do a backup before deleting!! :)

Jayvee
  • 10,670
  • 3
  • 29
  • 40
  • Thank you Jayvee! This will do the trick! I'm not an expert, so recognizing how to do it seemed difficult to me! :) – furumc Jan 07 '22 at 07:40
  • 1
    I need to edit re.sub part : `namei=re.sub('\s\(\d+\)','',dirs[i])` `namej=re.sub('\s\(\d+\)',"",dirs[j])` Because there is a space character in the names where bracket occurs. – furumc Jan 07 '22 at 08:08