4

I want to parse HLS master m3u8 file and get the bandwidth, resolution and file name from it. Currently i am using String parsing to search string for some patterns and do the sub string to get value.

Example File:

#EXTM3U
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234
Stream1/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=763319,RESOLUTION=480x270
Stream2/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1050224,RESOLUTION=640x360
Stream3/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1910937,RESOLUTION=640x360
Stream4/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=3775816,RESOLUTION=1280x720
Stream5/index.m3u8

But i found that we can parse it using regular expressions like mentioned in this question: Problem matching regex pattern in Android

I don't have any Idea of regular expression so can some one please guide me to parse this using regular expression.

Or can someone help me in writing regexp for parsing out BANDWIDTH and RESOLUTION values from below string

#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234
Community
  • 1
  • 1
User7723337
  • 11,857
  • 27
  • 101
  • 182

4 Answers4

10

You could try something like this:

    final Pattern pattern = Pattern.compile("^#EXT-X-STREAM-INF:.*BANDWIDTH=(\\d+).*RESOLUTION=([\\dx]+).*");

    Matcher matcher = pattern.matcher("#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234");
    String bandwidth = "";
    String resolution = "";

    if (matcher.find()) {
        bandwidth = matcher.group(1);
        resolution = matcher.group(2);
    }

Would set bandwidth and resolution to the correct (String) values.

I haven't tried this on an android device or emulator, but judging from the link you sent and the android API it should work the same as the above plain old java.

The regex matches strings starting with #EXT-X-STREAM-INF: and contains BANDWIDTH and RESOLUTION followed by the correct value formats. These are then back-referenced in back-reference group 1 and 2 so we can extract them.

Edit:

If RESOLUTION isn't always present then you can make that portion optional as such:

"^#EXT-X-STREAM-INF:.*BANDWIDTH=(\\d+).*(?:RESOLUTION=([\\dx]+))?.*"

The resolution string would be null in cases where only BANDWIDTH is present.

Edit2:

? makes things optional, and (?:___) means a passive group (as opposed to a back-reference group (___). So it's basically a optional passive group. So yes, anything inside it will be optional.

A . matches a single character, and a * makes means it will be repeated zero or more times. So .* will match zero or more characters. The reason we need this is to consume anything between what we are matching, e.g. anything between #EXT-X-STREAM-INF: and BANDWIDTH. There are many ways of doing this but .* is the most generic/broad one.

\d is basically a set of characters that represent numbers (0-9), but since we define the string as a Java string, we need the double \\, otherwise the Java compiler will fail because it does not recognize the escaped character \d (in Java). Instead it will parse \\ into \ so that we get \d in the final string passed to the Pattern constructor.

[\dx]+ means one or more characters (+) out of the characters 0-9 and x. [\dx\d] would be a single character (no +) out of the same set of characters.

If you are interested in regex you could check out regular-expressions.info or/and regexone.com, there you will find much more in depth answers to all your questions.

rvalvik
  • 1,559
  • 11
  • 15
  • Thanks for the reply will try your code. I have a questions about pattern matching when we call `pattern.matcher` what exactly it returns in `matcher` is it the string omitting the given patten in the regular expression? after doing `pattern.matcher` why we are calling `find`? – User7723337 Mar 07 '13 at 08:58
  • 1
    The matcher is an object that you use to perform matching operations on the given string based on the pattern. When you call `find()` it will try to find the next match in the given string, if it finds one it returns true and we can extract the result. You could have a look at [the documentation](http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html) for more info (that is the Java documentation, but it should behave the same way on Android, the android doc didn't have much details in it). – rvalvik Mar 07 '13 at 09:13
  • thanks for the explanation! I tied your code it is working, but in case if string doesnot have RESOLUTION in it and just bandwidth then? I tried it but `find` is failing in that case. can it be possible to check for RESOLUTION in either case if found then ok else just ignore or do we need to have tow separate expressions for parsing bandwidth and resolution and pass same string to both of them and call find? – User7723337 Mar 07 '13 at 09:34
  • 1
    See my edit :) As long as only RESOLUTION is optional it's straight forward, if you have instances without BANDWIDTH but with RESOLUTION then it gets a bit trickier, but it should still be do-able. – rvalvik Mar 07 '13 at 09:45
  • Thanks for the update it is working. Anything in between `(?:___)?` will be optional right? and also what is `.*` used for and why "\\" and also why it is `[\\dx]+` and not `[\\dx\\d]` sorry if i am asking so many questions :) sorry for that. – User7723337 Mar 07 '13 at 11:33
  • 1
    Updated my answer to include your latest questions. – rvalvik Mar 07 '13 at 15:00
  • Thanks for the wonderful explanation, it can not get better than this thanks. using you provided inputs i tried to write one regexp for parsing out number from this `#EXTINF:10, no desc` string. my regexp is as `^#EXTINF:(\\d+),$` but my `matcher.find()` is always returning false in this case, is my regexp is right? – User7723337 Mar 08 '13 at 07:57
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/25831/discussion-between-a-user-and-rvalvik) – User7723337 Mar 08 '13 at 08:37
1

you could just split strings, here's what I mean in python.

fu ="#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234"

for chunk in fu.split(':')[1].split(','):
    if chunk.startswith('BANDWIDTH'):
        bandwidth = int(chunk.split('=')[1])
    if chunk.startswith('RESOLUTION'):
        resolution = chunk.split('=')[1]

for Jorr-el

>>>> fu = '#EXT-X-STREAM-INF:BANDWIDTH=5857392,RESOLUTION=1980x1080,CODECS="avc1.42c02a,mp4a.40.2"'
>>>> for chunk in fu.split(':')[1].split(','):
....     if chunk.startswith('BANDWIDTH'):
....         bandwidth = int(chunk.split('=')[1])
....     if chunk.startswith('RESOLUTION'):
....         resolution = chunk.split('=')[1]
....         
>>>> bandwidth
5857392
>>>> resolution
'1980x1080'
>>>> 
Leroy Scandal
  • 329
  • 1
  • 4
0

I found this one might be help.
http://sourceforge.net/projects/m3u8parser/
(License: LGPLv3)

Johnny
  • 1,824
  • 23
  • 16
0

You can also use: Python m3u8 parser.

Example below:

import m3u8

playlist = """
#EXTM3U
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234
Stream1/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=763319,RESOLUTION=480x270
Stream2/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1050224,RESOLUTION=640x360
Stream3/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1910937,RESOLUTION=640x360
Stream4/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=3775816,RESOLUTION=1280x720
Stream5/index.m3u8
"""
_playlist = m3u8.loads(playlist).playlists

for item in _playlist:
    item_uri   = item.uri 
    resolution = item.stream_info.resolution
    bandwidth  = item.stream_info.bandwidth
    print(item_uri ,resolution , bandwidth )

result will be :

Stream1/index.m3u8 (416, 234) 476416
Stream2/index.m3u8 (480, 270) 763319
Stream3/index.m3u8 (640, 360) 1050224
Stream4/index.m3u8 (640, 360) 1910937
Stream5/index.m3u8 (1280, 720) 3775816
Cornea Valentin
  • 471
  • 6
  • 12