Python 2.7 Regex not matching desired pattern

Question

I am parsing all the rows of a .m3u file containing my IPTV playlist data. I am looking to isolate and print string sections within the file of the format:

tvg-logo="http//somelinkwithapicture.png"

..within a string that looks like:

#EXTINF:-1 catchup="default" catchup-source="http://someprovider.tv/play/dvr/${start}/2480.m3u8?token=%^%=&duration=3600" catchup-days=5 tvg-name="Sky Sports Action HD" tvg-id="SkySportsAction.uk" tvg-logo="http://someprovider.tv/logos/sky%20sports%20action%20hd.png" group-title="Sports",Sky Sports Action HD
http://someprovider.tv/play/2480.m3u8?token=465454=

My class looks like this:

import re

class iptv_cleanup():

    filepath = 'C:\\Users\\cg371\\Downloads\\vget.m3u'

    with open(filepath, "r") as text_file:
        a = text_file.read()
        b = re.search(r'tvg-logo="(.*?)"', a)
        c = b.group()
        print c

    text_file.close

iptv_cleanup()

All I am getting returned though is a string like this:

tvg-logo=""

I am a bit rusty with regexes, but I cannot see anything obviously wrong with this.

Can anyone assist?

Thanks

`c = b.group()` should be `c = b.group(1)` but even if you used `group()` you should have received `tvg-logo="http://someprovider.tv/logos/sky%20sports%20action%20hd.png"` — Wiktor Stribiżew, Sep 19 '18 at 22:24

lucas_7_94 · Answer 1 · 2018-09-19T22:26:16.530

0

Check (?:tvg-logo=\")[\w\W]*(?<=.png)

import re
reg = '(?:tvg-logo=\")[\w\W]*(?<=.png)'

string = '#EXTINF:-1 catchup="default" catchup-source="http://someprovider.tv/play/dvr/${start}/2480.m3u8?token=%^%=&duration=3600" catchup-days=5 tvg-name="Sky Sports Action HD" tvg-id="SkySportsAction.uk" tvg-logo="http://someprovider.tv/logos/sky%20sports%20action%20hd.png" group-title="Sports",Sky Sports Action HD http://someprovider.tv/play/2480.m3u8?token=465454='

print re.findall(reg,string, re.DOTALL)[0]

$python main.py
tvg-logo="http://someprovider.tv/logos/sky%20sports%20action%20hd.png

edited Sep 19 '18 at 22:26

answered Sep 19 '18 at 22:10

lucas_7_94

326
3
9

that is not printing the desired sub string..it is printing the entire string – gdogg371 Sep 19 '18 at 22:17
i had to remove the 'g' on the end though as it was throwing an error at execution... – gdogg371 Sep 19 '18 at 22:19
So, to be clear, you want to get 'tvg-logo="http//somelinkwithapicture.png" '? – lucas_7_94 Sep 19 '18 at 22:23
yes...each instance of it in the text file...i am then going to replace it with an empty string – gdogg371 Sep 19 '18 at 22:24
check now the edit, sorry for not asking before, still i'm not able to comment, only add answers :/ thanks for not downvoting – lucas_7_94 Sep 19 '18 at 22:26
your regex works against the isolated string in your answer above, but parsing my .m3u file, it is not returning the required subset, but the full file – gdogg371 Sep 19 '18 at 22:42
i used to be sh*t hot with regexes as well...not used them much for a good while now and forgot most of what i knew... – gdogg371 Sep 19 '18 at 22:56
The non-capturing group `(?:tvg-logo=\")` is pointless; with `re.DOTALL` enabled you may as well replace `[\w\W]` with a dot `.`; it is better to use the closing double quote as the delimiter rather than a negative look-behind `(?<=.png)`; and the dot in that look-behind needs to be escaped. – Borodin Sep 19 '18 at 23:23

score 0 · Answer 2 · answered Sep 19 '18 at 22:48

This worked in the end:

import re

class iptv_cleanup():

    filepath = 'C:\\Users\\cg371\\Downloads\\vget.m3u'

    with open(filepath, "r") as text_file:
        a = text_file.read()
        b = re.findall(r'tvg-logo="(.*?)"', a)

        for i in b:

            print i


    text_file.close

iptv_cleanup()

Thanks you for your input all...

Python 2.7 Regex not matching desired pattern

2 Answers2