-1

Recently, I have been creating an editor in Python 3.7.6 (using tkinter), I created the following syntax for highlighting single, double and triple quotes but I want to exclude all the characters inside a curly bracket of an f-string, I tried using [^\{(.*)\}] as a negated set, but then realized it wouldn't work. I tried searching on the internet but all those didn't fit in with my regex.

This is the regex part of the code :

def regex_groups(self, name, alternates):
    return "(?P<%s>" % name + "|".join(alternates) + ")"

stringprefix = r"(\bB|b|br|Br|bR|BR|rb|rB|Rb|RB|r|u|R|U|f|F|fr|Fr|fR|FR|rf|rF|Rf|RF)?"
sqstring = stringprefix + r"'[^'\\\n]*(\\.[^'\\\n]*)*'?"
dqstring = stringprefix + r'"[^"\\\n]*(\\.[^"\\\n]*)*"?'
sqqqstring = stringprefix + r"'''[^'\\]*((\\.|'(?!''))[^'\\]*)*(''')?"
dqqqstring = stringprefix + r'"""[^"\\]*((\\.|"(?!""))[^"\\]*)*(""")?'
string = self.regex_groups("STRING", [sqqqstring, dqqqstring, sqstring, dqstring])

What I tried was to break stringprefix into two strings r"(f|F|fr|Fr|fR|FR|rf|rF|Rf|RF)?" and r"(B|b|br|Br|bR|BR|rb|rB|Rb|RB|r|u|R|U)?" and then using both with sqstring, dqstring, sq3string and dq3string separately, but it wasn't successful.

Here is one of the part of the regex testing : enter image description here

Please help me !

Any help is appreciated ! :)

prerakl123
  • 121
  • 1
  • 11
  • What do you expect for the third string with the curly's? – The fourth bird Feb 14 '21 at 14:02
  • 1
    Pairs of opening/closing characters are not regular. You cannot accurately match them using regular expressions. – MisterMiyagi Feb 14 '21 at 14:03
  • My expectations are that it should highlight `f'This is an {` , leave the ones in between (in this case `f_string` and then `}'` – prerakl123 Feb 14 '21 at 14:06
  • @MisterMiyagi When I try regex for excluding text from, in between, any kind of braces without quotes it works but with quotes it doesn't – prerakl123 Feb 14 '21 at 14:08
  • What result do you expect for ``f"This is an {{plain string}}"`` and ``f"This is a set: { {1, 2, 3}}"``? – MisterMiyagi Feb 14 '21 at 14:09
  • For this the furthermost and the last curly bracket should be included in the string and the `{plain string}` and `{1, 2, 3}` should be excluded from regex matching. – prerakl123 Feb 14 '21 at 14:17
  • Could there be any relation for this with delimiters ?? IDK much because I'm new to these regexes and this is the first complex one I have ever created – prerakl123 Feb 14 '21 at 14:26

1 Answers1

1

I don't know if regexes are the way to go here. You could just use the tokenize module, which is part of the standard library, to parse and tokenize your Python source code. Depending on each token's type, you choose a different color. For example:

import tokenize
from io import BytesIO

src = """
def foo(bar):
    print(bar, "hi there")
"""

tokens = tokenize.tokenize(BytesIO(src.encode("utf-8")).readline)

openers = ("class", "def", "for", "while", "if", "try", "except")

for token in tokens:
    color = ""
    line = token.start[0]
    start = token.start[1]
    end = token.end[1]
    if token.exact_type == tokenize.NAME and token.string in openers:
        color = "orange"
    elif token.exact_type == tokenize.NAME:
        color = "blue"
    elif token.exact_type == tokenize.STRING:
        color = "green"

    if color:
        print(f"token '{token.string}' (line: {line}, col: {start} - {end}) should be {color}")

Output:

token 'def' (line: 2, col: 0 - 3) should be orange
token 'foo' (line: 2, col: 4 - 7) should be blue
token 'bar' (line: 2, col: 8 - 11) should be blue
token 'print' (line: 3, col: 4 - 9) should be blue
token 'bar' (line: 3, col: 10 - 13) should be blue
token '"hi there"' (line: 3, col: 15 - 25) should be green
>>> 

A lookup table (dictionary) to map token types to colors would be more appropriate than a big chunk of if-statements, but you get the idea.

Paul M.
  • 10,481
  • 2
  • 9
  • 15
  • You are right, but the editor has already been made and it works just like the built-in IDLE but with more features (like auto-completion of brackets, auto-completion of statements like `if __name__ == '__main__'` , `def __init__(self):`, etc, etc). So maybe rewriting the whole thing might not be a good idea for this one but still thanks for the answer. I can use it in another editor (like a second version of the editor I made or something like that). But for now my only hope for the f-string feature is regex (IF POSSIBLE). – prerakl123 Feb 14 '21 at 15:52