3

I'm writing a script that burns subtitles into video files to prepare them for a personal stream I'm hosting. I'm having a hard time finding which type of subtitle is used in the file. I use ffprobe to get the files' information, and I can get stuff like the codec type, but I was wondering if there is a way to determine if a subtitle track is image based or text based. I can only think of getting a list of all possible codecs and match the codec type with this list but it would be very useful to have an info somewhere that can tell me "OK this is an image-based subtitle track", as when I burn I cannot use the same filters with ffmpeg to burn image vs. text subtitles.

karel
  • 5,489
  • 46
  • 45
  • 50
Shex
  • 111
  • 8

2 Answers2

4

Well, it partly depends on what OS you're using. In Linux, you can run the following command to get a list of all subtitle codecs supported by your version of ffmpeg:

ffmpeg -codecs | grep "^...S"

To narrow it down to which subtitle codecs your ffmpeg build is capable of encoding:

ffmpeg -codecs | grep "^..ES"

It sounds like you'll be interested foremost in which subtitles ffmpeg can decode:

ffmpeg -codecs | grep "^.D.S"

On my ffmpeg build (git-2020-08-31-4a11a6f), the command above displays the following result:

 DES... ass                  ASS (Advanced SSA) subtitle (decoders: ssa ass ) (encoders: ssa ass )
 DES... dvb_subtitle         DVB subtitles (decoders: dvbsub ) (encoders: dvbsub )
 DES... dvd_subtitle         DVD subtitles (decoders: dvdsub ) (encoders: dvdsub )
 D.S... eia_608              EIA-608 closed captions (decoders: cc_dec )
 D.S... hdmv_pgs_subtitle    HDMV Presentation Graphic Stream subtitles (decoders: pgssub )
 D.S... jacosub              JACOsub subtitle
 D.S... microdvd             MicroDVD subtitle
 DES... mov_text             MOV text
 D.S... mpl2                 MPL2 subtitle
 D.S... pjs                  PJS (Phoenix Japanimation Society) subtitle
 D.S... realtext             RealText subtitle
 D.S... sami                 SAMI subtitle
 D.S... stl                  Spruce subtitle format
 DES... subrip               SubRip subtitle (decoders: srt subrip ) (encoders: srt subrip )
 D.S... subviewer            SubViewer subtitle
 D.S... subviewer1           SubViewer v1 subtitle
 DES... text                 raw UTF-8 text
 D.S... vplayer              VPlayer subtitle
 DES... webvtt               WebVTT subtitle
 DES... xsub                 XSUB

Which of these are graphics-based/non-text? Most are text-based. Note that "text" can mean raw text (e.g. ASCII or UTF-8), XML, or HTML.

Image-based sub-title codecs in ffmpeg

  • dvbsub
  • dvdsub
  • pgssub
  • xsub

Text-based subtitle codecs in ffmpeg

  • ssa,ass
  • webvtt
  • jacosub
  • microdvd
  • mov_text
  • mpl2
  • pjs
  • realtext
  • sami
  • stl
  • subrip
  • subviewer
  • subviewer1
  • text
  • vplayer
  • webvtt

EIA Closed Captions EIA-608 is a Closed Caption format, and seems to be a bit of a bear to manage properly with ffmpeg.

eia_608              EIA-608 closed captions (decoders: cc_dec )

This Stack Overflow post offers one of the better explanations of how they function and how to manage them if you know they exist in a file: Can ffmpeg extract closed caption data

MrPotatoHead
  • 1,035
  • 14
  • 11
2

I don't see a simple, direct method of determining text vs image based subtitles with ffprobe.

mediainfo will output more info in this case. This example has a dvd_subtitle and a subrip.

Text #2
ID                                       : 1
Format                                   : VobSub
Codec ID                                 : S_VOBSUB
Codec ID/Info                            : Picture based subtitle format used on DVDs
Duration                                 : 14 min 57 s
Default                                  : Yes
Forced                                   : No

Text #2
ID                                       : 2
Format                                   : UTF-8
Codec ID                                 : S_TEXT/UTF8
Codec ID/Info                            : UTF-8 Plain Text
Duration                                 : 5 s 0 ms
Default                                  : Yes
Forced                                   : No
llogan
  • 121,796
  • 28
  • 232
  • 243
  • Ok thanks! But if I understand correctly, I have to parse a description for certain keywords (like "Picture based") to get to know if the subs are picture based or not. It might be problematic if mediainfo doesn't write its description in a manner that ressembles this example. Still thanks for the insight. – Shex Nov 13 '19 at 19:20
  • 1
    @Shex I don't know `mediainfo` well enough to know how consistent the `Codec ID/Info` is or if it will even show up for all formats (I use `ffprobe` almost exclusively). I only tried it on one MKV file. It's something to look into at least; you'll have to try some tests with your specific inputs. If it doesn't work out your original format list idea is worth pursuing as there aren't a huge about of subtitle formats compared to video and audio. If it's helpful you can output specific fields with the `--Output` option. – llogan Nov 13 '19 at 19:34
  • That's what I'm thinking to do. If I output the list of all supported ffmpeg subtitle codecs, I can see that there isn't a lot of picture based codecs. I was mainly wondering if there could be a way to know (like a flag variable) if the subtitle codec is imaged based or text based. Still, thanks again for taking the time to respond! – Shex Nov 14 '19 at 20:31
  • I notice that picture-based subtitle streams have a `width` attribute in ffprobe, while subrip subtitles don't. – nicbou Aug 23 '21 at 12:28
  • @nicbou Good observation, but only shows `width=N/A` for dvd_subtitle in VOB. – llogan Aug 23 '21 at 16:14
  • You are right. I forgot to edit my comment later, but I realised the same with another subtitle codec. It looks like a codec-by-codec approach is the way to go, if you want to stick to ffprobe. – nicbou Aug 29 '21 at 17:55