0

I'm working on cleaning up the tags of mp3s that contain the web links. I tried the regular expression that clears up the web-links

(\w+)*(\s|\-)(\w+\.(\w+))

with
$1

However, when I try using the same on the file, the extension is replaced. How do I make the extension here, .mp3 as an exception with the above regex?

I have tried using this but the replace takes more time

Community
  • 1
  • 1
dmachop
  • 824
  • 1
  • 21
  • 39
  • `cleaning up the tags of mp3s that contain the web links` What do you mean by this? What specifically are you trying to match out of what, and what are you replacing with? – Brad Mar 08 '15 at 01:56
  • If the filename is SongName - www.sitename.com.mp3 or any of the web address that is contained in the tag, then the result should be SongName.mp3. I'm replacing it with blank '' – dmachop Mar 08 '15 at 01:58
  • You want to remove things that look like domain names out of a file name? How are you going to reliably do that? Domains could be anything, especially these days. The standards don't exist anymore. – Brad Mar 08 '15 at 02:00

2 Answers2

1

based on your examples, use this pattern

\s-\s\S+(?=\.)

and replace w/ nothing

\s              # <whitespace character>
-               # "-"
\s              # <whitespace character>
\S              # <not a whitespace character>
+               # (one or more)(greedy)
(?=             # Look-Ahead
  \.            # "."
)               # End of Look-Ahead

Demo

alpha bravo
  • 7,838
  • 1
  • 19
  • 23
0

If you replace by the first group only, sthi will only be the name of the file, extension not included. Your regex do not actually catch the extension, it stops after the top level domain (.com) of the website.

You should use:

(\w+)(\s\-\s)(\w+\.\w+.\w+)(\.\w+)

Regular expression visualization

Debuggex Demo

and replace everything by groups 1 and 4. Reminds that usually the group 0 is containing the whole string matched by the regular expression.

More details, on the example "MySong - www.mysite.com.mp3:

    (\w+) // 1. will match "MySong", replace by ([\w\s]+) to match "My Song"
    (\s\-\s)  // 2. will match " - "
    (\w+\.\w+.\w+)  // 3. will match "www.mysite.com". You may want to let "www." be optional by replacing by "([\w+\.]?\w+.\w+)
    (\.\w+)  // 4. the '.mp3" extension
Romain G
  • 1,276
  • 1
  • 15
  • 27
  • Thanks, but does not work for my song name - www.snakemp3.com.mp3 The songname can be one or many words. In your case, matches name - ... – dmachop Mar 08 '15 at 02:34
  • You should replace the first group by `([\w\s]+)` to match one or many words. – Romain G Mar 09 '15 at 15:08