0

I am trying to get the captions from a segment from a live feed. I am running the command

ffmpeg -i seg-1077853030-v1-a1.ts

Output

`Input #0, mpegts, from 'seg-109853030-v1-a1.ts': Duration: 00:00:06.01, start: 57867.901133, bitrate: 2649 kb/s Program 1

Stream #0:0[0x100]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], Closed Captions, 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc
Stream #0:1[0x101]: Audio: aac (LC) ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 98 kb/s
Stream #0:2[0x102]: Data: timed_id3 (ID3  / 0x20334449)`

My question is what command should I run to print out the caption file with the track metadata including label and language.

aergistal
  • 29,947
  • 5
  • 70
  • 92
user1163234
  • 2,407
  • 6
  • 35
  • 63
  • If it's EIA-608 it might work with `ffmpeg -f lavfi -i movie=input.ts[out+subcc] -map 0:1 output.srt` (see: https://stackoverflow.com/questions/3169910/can-ffmpeg-extract-closed-caption-data) – aergistal Nov 20 '20 at 08:52
  • How do I find this? #EXT-X-MEDIA:TYPE=CLOSED-CAPTIONS,GROUP-ID="CC",LANGUAGE="eng",NAME="English",INSTREAM-ID="CC1" – user1163234 Nov 20 '20 at 09:44
  • In-stream CC1 is EIA-608 so give the command a try. – aergistal Nov 20 '20 at 09:47
  • Thx but Im not sure what the command would look like... – user1163234 Nov 20 '20 at 11:22
  • Thx that worked! But I dont see fields like "GROUP-ID", "LANGUAGE" and "NSTREAM-ID" in the srt file that was generated . What am I missing? – user1163234 Nov 20 '20 at 11:28

1 Answers1

1

If your MPEG-TS file is a HLS segment then just parse the HLS master playlist to retrieve the values. If your input is captured from a live broadcast then read on.

  1. GROUP-ID

It's up to you to set this value in the HLS playlist to indicate the rendition's group.

See: https://www.rfc-editor.org/rfc/rfc8216#section-4.3.4.1.1

  1. LANGUAGE

This is where things get a bit more complicated.

enter image description here

CEA-608 captions do not include the language code.

For CEA-708 and 608 over 708 this is indicated as part of the ATSC Program and System Information Protocol (PSIP) tables which should be present in the PMT and EIT.

Caption Service Descriptor

Caption Service Descriptor (concluded)

  • cc_type - 0 for 608, 1 for 708
  • line21_field - when cc_type is 0: 0 for field 1 (which includes channels CC1 and CC2) and 1 for field 2 (which includes channels CC3 and CC4)
  • caption_service_number - when cc_type is 1
  1. INSTREAM-ID

This can be either CC1, CC2 (field 1), CC3, CC4 (field 2) for CEA-608 - where CC1 and CC2 carry normal and easy-reader captions for the primary language and CC3 and CC4 for the secondary language - or in the form SERVICEn for CEA-708 services.

These should be advertised in the CSD (see above), if present.

I don't think FFmpeg extracts these by default so you'll either need to extend it or write an MPEG-TS parser to retrieve the information. There are a few libs for parsing MPEG-TS and for dealing with captions (ex: libcaption by fellow StackOverflow user @szatmary).

If you just want to extract the captions use FFmpeg or ccextractor

If you want to do it manually you could use some software like DVBInspector to see the PSI contents:

DVBInspector CSD

Community
  • 1
  • 1
aergistal
  • 29,947
  • 5
  • 70
  • 92