1

I'm using Mp3tag's "Tools" feature to batch run FFmpeg in Windows, in order to batch extract the embedded lyrics content (USLT frame of ID3v2 tag) from MP3 files, I know with FFmpeg I can do something like:

-i "%_path%" -f ffmetadata "%_folderpath%\%_filename%.txt"

"%_path%" = full path of the MP3 file

"%_folderpath%%_filename%.txt" = path and filename of the exported txt file.

The command above extracts all the metadata from MP3 file and export them into a txt file with the following cotent for example:

;FFMETADATA1
album=name of the album
artist=name of the artist
title=name of the title
lyrics-eng=[00:01.23]line1 of lyrics
\
[00:04.56]line2 of lyrics
\
[00:07.89]line3 of lyrics
\
[01:03.12]3rd last line of lyrics
\
[02:04.34]2nd last line of lyrics
\
[03:05.67]Last line of lyrics
\

date=2020
encoder=Lavf59.23.100

(the original lyrics uses Simple LRC format with timestamps in each line, certain lines contain only the timestamp with empty lyrics)

(There might (or might not) be additional metadata (e.g. date and encoder in the example above) following the lyrics part)

As seen above, the backslash "\" (which is not present in the original lyrics) is somehow added after each line of lyrics, between CR (CarriageReturn) and LF (LineFeed) as seen in Notepad++ (the original lyrics use CRLF as EOL characters).

So how do I modify the given command line to export only the lyrics part (exluding all other metadata and the extra backslash "\"), an example of the expected text file content is shown below:

[00:01.23]line1 of lyrics
[00:04.56]line2 of lyrics
[00:07.89]line3 of lyrics
[01:03.12]3rd last line of lyrics
[02:04.34]2nd last line of lyrics
[03:05.67]Last line of lyrics

with the original EOL characters from lyrics such as CRLF

Wonderer
  • 19
  • 5

3 Answers3

0
  1. I suggest that you remove all the unwanted \ by searching for \s*\\\s* and replacing them with \n. (Test here: https://regex101.com/r/PEBWwm/1)
  2. Then search for (?<=lyrics-eng=)(?:[\w ]+\s)+ to capture all the lyrics without \ between them. (Test here: https://regex101.com/r/8ad6kI/1)
anotherGatsby
  • 1,568
  • 10
  • 21
  • 1
    Sorry I should have made the question more clear, each line of lyrics begins with a timestamp [mm:ss.xx] (I've editted the original question and updated all screenshots, please give it a re-read), perhaps it's possible to take advantage of the timestamps when locating the lyrics? Also, I know how to do regex in Notepad++, but how do I incorporate the regex operations into the original command line (i.e. how to modify the example command line in the original question) so that it's automated? – Wonderer Apr 30 '22 at 09:09
  • @Wonderer I see, sorry I have no idea how to do incorporate regex to command line of ffmpeg. But as you said timestamp can be used to select lyrics in one step. – anotherGatsby Apr 30 '22 at 10:06
0

This adds to @anothergatsby answer:

AFAIK, FFmpeg itself does not have a capability to return only a particular metadata tag, much less modifying the tag values. Your only option is to pipe the FFmpeg output to a regex-capable command (e.g., sed Linux command, handling the regex in a Python/PowerShell etc. script).

For example:

ffmpeg -i "%_path%" -f ffmetadata - | sed -n {regex_expr}  "%_folderpath%\%_filename%.txt"

Based on the text output path, it appears that you are in Windows env. If I were you I'd learn PowerShell scripting and its regex support.

kesh
  • 4,515
  • 2
  • 12
  • 20
  • if ffmpeg wasn't possible, how about ffprobe? which seems to be designed more for this kind of application. – Wonderer May 03 '22 at 09:39
0

The regex you're looking for is this:

(\[[0-9].*)

I've no clue about how to do the editing while extracting the lyrics or with the command prompt in anyway. If you can't find a better way and know python a bit, you can create a python script with the below code put it inside a folder that contains only the files you want to edit and run.

import re
import os


def main():

    for file in os.listdir():
        with open(file, "r+") as f:
            lyrics = re.findall(r"(\[[0-9].*)", f.read())
            f.truncate(0)
            f.seek(0)
            for lyric in lyrics:
                f.write(lyric + "\n")


if __name__ == "__main__":
    main()
Nik Owa
  • 86
  • 1
  • 4