I am trying to extract track information from MKV
files using mkvinfo
from a bash
script. The output is a long series of lines with repeating patterns as delimiters for various track properties of various track types. An example of a track is:
…
| + A track
| + Track number: 6 (track ID for mkvmerge & mkvextract: 5)
| + Track UID: 11555278830806058806
| + Track type: subtitles
| + (Unknown element: TrickTrackFlag; ID: 0xc6 size: 3)
| + Enabled: 1
| + Default flag: 0
| + Forced flag: 0
| + Lacing flag: 0
| + MinCache: 0
| + Timecode scale: 1
| + Name: Spanish
| + Language: spa
| + Codec ID: S_TEXT/UTF8
| + (Unknown element: TrackAttachmentLink; ID: 0x7446 size: 11)
| + Codec decode all: 1
| + A track
| + Track number: 7 (track ID for mkvmerge & mkvextract: 6)
…
There can be multiple instances of a given track type and the number of lines for a track is somewhat variable. I need to extract certain track properties from specific track types. For example, if I want to find all instances of the subtitles
track type and extract the Track number
and the Codec ID
, I can pipe the results through grep:
mkvinfo "file.mkv" | grep "subtitles" -B 2 | grep "Track number"
This outputs the lines containing the track numbers for all subtitle tracks. I have to put the lines into an array and filter them to get the first number so I can use it with mkvpropedit
, which requires the first number.
Similarly:
mkvinfo "file.mkv" | grep "subtitles" -A 10 | grep "Codec ID: " | sed 's/^.**: //'
outputs the codec IDs for all subtitle tracks.
This works fine IF I know exactly how many lines there are before/after the line containing subtitles
. The problem is, the exact number of lines to include varies from file to file. So what I need to do is to output the entire block of lines between | + A track
and a line beginning with |+
OR | +
OR EOF
. I also need to filter the block to extract the first Track number
and the Codec ID
. I tried using | grep -Eo [0-9]+ | head -1
to extract the first number of each track but it only works on the first track found and quits. If there's a way to make it work for all tracks in one line that would be helpful. The second example I gave using sed
works for the Codec ID
.
The bottom line QUESTION is:
How can I extract specific properties of specific track types, such as the example given, and put them into an array or arrays for further processing?
I am hoping to be able to meet the following criteria:
- I want to use existing
bash
(GNU bash, version 4.3.30(1)-release (x86_64-apple-darwin12.5.0)) utilities likesed
,awk
,grep
, … - I don't want to have to create an 'intermediate file'
- I want to simply pipe the output of
mkvinfo
into the various utilities
I found lots of threads that show how to use sed
to find a block of text between two words but I could not get the code to work with entire lines or strings containing spaces. Maybe there is a way to do that but I don't know enough about sed
to be able to adapt the code to my situation.
Please explain in detail how your code works so I can 'learn how to fish' so next time I can do it myself.