If your MPEG-TS file is a HLS segment then just parse the HLS master playlist to retrieve the values.
If your input is captured from a live broadcast then read on.
GROUP-ID
It's up to you to set this value in the HLS playlist to indicate the rendition's group.
See: https://www.rfc-editor.org/rfc/rfc8216#section-4.3.4.1.1
LANGUAGE
This is where things get a bit more complicated.

CEA-608 captions do not include the language code.
For CEA-708 and 608 over 708 this is indicated as part of the ATSC Program and System Information Protocol (PSIP) tables which should be present in the PMT and EIT.


cc_type
- 0
for 608, 1
for 708
line21_field
- when cc_type
is 0
: 0
for field 1 (which includes channels CC1 and CC2) and 1
for field 2 (which includes channels CC3 and CC4)
caption_service_number
- when cc_type
is 1
INSTREAM-ID
This can be either CC1
, CC2
(field 1), CC3
, CC4
(field 2) for CEA-608 - where CC1 and CC2 carry normal and easy-reader captions for the primary language and CC3 and CC4 for the secondary language - or in the form SERVICEn
for CEA-708 services.
These should be advertised in the CSD (see above), if present.
I don't think FFmpeg extracts these by default so you'll either need to extend it or write an MPEG-TS parser to retrieve the information. There are a few libs for parsing MPEG-TS and for dealing with captions (ex: libcaption by fellow StackOverflow user @szatmary).
If you just want to extract the captions use FFmpeg or ccextractor
If you want to do it manually you could use some software like DVBInspector to see the PSI contents:
