2

note: I'm using Python3

I've been searching everywhere and finding nothing complete. Asking on IRC everywhere. I need a regex expression that removes ALL IRC colour control codes. Nowhere is there a complete solution.

Bold, Italics, Underline, Reverse, Colour, and Plain text The characters numbers are 2 29 31 22 3 15 respectively.

Edited:

I just found a \x0f character being used also.

The Colour character (3) contains possibly max 2 digits after it, with a possible comma then up to max of 2 digits more or no digits just character 3. It might also just be a comma with plain text after it in which case the comma should be left in the string.

Please help I am stuck in the mud.

Example:

'\003' + '12,4' + 'Red and blue' + '\003'+', \031Underline\031' 

The 12 is blue and the 4 is red, used with character 3.

The expected output is just "Red and blue, Underline" plain text, no colour codes. That way I can use:

line = 'Red and blue, Underline'

line.split(' ')[0] == 'Red'
JosEduSol
  • 5,268
  • 3
  • 23
  • 31
baudsmoke
  • 99
  • 1
  • 9
  • an example along with expected output would be better. – Avinash Raj Mar 25 '15 at 04:52
  • An example is 12,4red and blue, underline. The control characters dont display in a browser. Maybe '\003' + '12,4' + 'Red and blue'+'\003'+', \031Underline\031' – baudsmoke Mar 25 '15 at 04:58
  • please use code blocks to display code, control chars can display in browser if code block is used . You can edit your question to update that – arkoak Mar 25 '15 at 07:15
  • Maybe some information from this page - http://www.ircbeginner.com/ircinfo/colors.html - should be helpful. Judging by the question, it seems you are not going to *remove*, but rather *replace* these codes with words denoting them. Then, you might even need no regex solution... – Wiktor Stribiżew Mar 25 '15 at 07:45

3 Answers3

1

I wrangled up some working code, I noticed a bug in the previous post of similar code that caused the app the crash. Then noticed it was likely that the code did not work and I revised it to how it is here. This code SHOULD work as intended. It is not extensively tested but I did get a positive result while coding it. The code below strips all colour format mIRC codes from the text properly; this time. :/

> def colourstrip(text_with_msl_colour):
>     find = text_with_msl_colour.find('\x03')
>     while find > -1:
>         find_end = find + 1
>         done = False
>         text_with_msl_colour = text_with_msl_colour[0:find] + text_with_msl_colour[find_end:]
>         if len(text_with_msl_colour) - 1 <= find_end:
>             done = True
>         try:
>             assert not done
>             done = True
>             assert int(text_with_msl_colour[find]) >= 0
>             done = False
>             text_with_msl_colour = text_with_msl_colour[0:find] + text_with_msl_colour[find_end:]
>             if len(text_with_msl_colour) - 1 <= find_end:
>                 done = True
>             assert int(text_with_msl_colour[find]) >= 0
>             text_with_msl_colour = text_with_msl_colour[0:find] + text_with_msl_colour[find_end:]
>         except:
>             pass
>         if not done:
>             if len(text_with_msl_colour) >= find_end and text_with_msl_colour[find] != ',': done = True
>         if (not done) and (len(text_with_msl_colour) > find_end) and (text_with_msl_colour[find] == ','):
>             try:
>                 text_with_msl_colour = text_with_msl_colour[0:find] + text_with_msl_colour[find_end:]
>                 assert int(text_with_msl_colour[find]) >= 0
>                 text_with_msl_colour = text_with_msl_colour[0:find] + text_with_msl_colour[find_end:]
>                 assert int(text_with_msl_colour[find]) >= 0
>                 text_with_msl_colour = text_with_msl_colour[0:find] + text_with_msl_colour[find_end:]
>                 done = True
>             except:
>                 done = True
>         find = text_with_msl_colour.find('\x03')
>     text_with_msl_colour = text_with_msl_colour.replace('\x02', '')
>     text_with_msl_colour = text_with_msl_colour.replace('\x1d', '')
>     text_with_msl_colour = text_with_msl_colour.replace('\x1f', '')
>     text_with_msl_colour = text_with_msl_colour.replace('\x16', '')
>     text_with_msl_colour = text_with_msl_colour.replace('\x0f', '')
>     return text_with_msl_colour

No regex can do this, it must be done with this code here.

Kenggi Peters
  • 31
  • 1
  • 5
0

I know I asked for a regex solution but I finally got around to coding a working non regex solution.

I updated the code to be more compatible with colour codes; allowing infinity colour code numbers as the irc clients wrap the colours around starting from the first colour (0 which is white) comes after the end of the colour list and so forth forever. So now the colourstrip() will treat the colour numbers for what they are instead of the old code which demanded the colour numbers to be a maximum of 2 digits which is pointless to do anyway.

def colourstrip(data):
    find = data.find('\x03')
    while find > -1:
        done = False
        data = data[0:find] + data[find+1:]
        if len(data) <= find+1:
            done = True
        try:
            assert not done
            assert int(data[find])
            while True:
                assert int(data[find])
                data = data[0:find] + data[find+1:]
        except:
            if not done:
                if data[find] != ',': done = True

        if (not done) and (len(data) > find+1) and (data[find] == ','):
            try:
                assert not done
                assert int(data[find+1])
                data = data[0:find] + data[find+1:]
                data = data[0:find] + data[find+1:]
            except:
                done = True
            try:
                assert not done
                while True:
                    assert int(data[find])
                    data = data[0:find] + data[find+1:]
            except: pass

        find = data.find('\x03')
    data = data.replace('\x02','')
    data = data.replace('\x1d','')
    data = data.replace('\x1f','')
    data = data.replace('\x16','')
    data = data.replace('\x0f','')
    return data

datastring = '\x03123434,27384This is coolour \x032689,34344This is too\x03'
print(colourstrip(datastring))

Thank you for the help everyone.

baudsmoke
  • 99
  • 1
  • 9
-1
[\x02\x0F\x16\x1D\x1F]|\x03(\d{,2}(,\d{,2})?)?

This will match all IRC formatting codes you have mentioned. In the case of color codes, it will even catch malformed ones like \x03,11, \x034, and \x03,. I realize this may or may not be ideal depending on how you wish to handle malformed codes like those, but you could easily tweak it to do what you want. If need be, you can explain how you'd like those handled and I can update the answer to reflect that.

As for what to do, one solution is:

pattern = r'[\x02\x0F\x16\x1D\x1F]|\x03(\d{,2}(,\d{,2})?)?';
text = '\x0312,4Text\x03';
stripped = re.sub(pattern, '', text);

See also Section 6.2 of the Python docs.

ZeroKnight
  • 518
  • 3
  • 17
  • With \x03,11 \x034, and \x03, the comma shouldbe left in as plain text. Also how do we use that regular expression? text = '\x0312,4Text\x03; striptext = regex_that_returns_plain_text(text) – baudsmoke Mar 30 '15 at 21:45
  • Why would you want to leave the comma? You'd be handling malformed color codes by leaving behind a comma that shouldn't be there in the first place. The whole thing should be stripped. As for how to apply it, I edited the answer (you could have found this out yourself by searching for a few seconds though). – ZeroKnight Apr 01 '15 at 02:27
  • ZeroKnight The comma stays because if you are colouring a comma or finishing the colour at a comma the comma stays. Its not malformed, its like that by design. so \x03, Text so it is in error if Text is a number because of the space after the control char? In all cases the comma is where the text starts at, not stripped out. – baudsmoke Apr 08 '15 at 12:36
  • ZeroKnight There is no removing of text. '\x0312, Text' is a blue comma followed by Text. That is how all the irc clients view the text. They dont try to fix broken colour attempts they just end the colour sequence. – baudsmoke Apr 08 '15 at 12:45