0

While writing re.compile, i used r prefix many times (re.compile(r'(xyx)). However, I have seen (re.compile(f'(xyx)) for the first time and I am not sure what is it doing. The output does not make any sense to me either. Can someone please explain what this f is doing here?

import re, string
re_tok = re.compile(f'([{string.punctuation}“”¨«»®´·º½¾¿¡§£₤‘’])')
def tokenize(s): 
    return re_tok.sub(r' \1 ', s).split()


>>> tokenize('˚∆˙©∆©˙¬ ldgkl slgh lshsg ieh 954n bvery590oerfdb o3pg')
learner
  • 2,582
  • 9
  • 43
  • 54

4 Answers4

2

A formatted string literal or f-string is a string literal that is prefixed with 'f' or 'F'.

These strings may contain replacement fields, which are expressions delimited by curly braces {}.

While other string literals always have a constant value, formatted strings are really expressions evaluated at run time.

IN THIS CASE:

The curly brackets around the 'string.punctuation`` are a replacement field, i.e. the string is to be formatted withstring.punctuation, which, in Python, is a 'string of ASCII characters which are considered punctuation marks in theC` locale'.

To find out more, check out these Python docs and string.punctuation references :-)

Adi219
  • 4,712
  • 2
  • 20
  • 43
1

Those are various flags that modify string literal behaviour r means raw string and f is for string interpolation

See explanation from PEP:

F-strings provide a way to embed expressions inside string literals, using a minimal syntax. It should be noted that an f-string is really an expression evaluated at run time, not a constant value. In Python source code, an f-string is a literal string, prefixed with 'f', which contains expressions inside braces. The expressions are replaced with their values. Some examples are:

>>> import datetime
>>> name = 'Fred'
>>> age = 50
>>> anniversary = datetime.date(1991, 10, 12)
>>> f'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.'
'My name is Fred, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
>>> f'He said his name is {name!r}.'
"He said his name is 'Fred'."

https://www.python.org/dev/peps/pep-0498/

And the python docs:

Regarding r

Both string and bytes literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and treat backslashes as literal characters. As a result, in string literals, '\U' and '\u' escapes in raw strings are not treated specially. Given that Python 2.x’s raw unicode literals behave differently than Python 3.x’s the 'ur' syntax is not supported.

https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

Regarding f

A formatted string literal or f-string is a string literal that is prefixed with 'f' or 'F'. These strings may contain replacement fields, which are expressions delimited by curly braces {}. While other string literals always have a constant value, formatted strings are really expressions evaluated at run time.

https://docs.python.org/3/reference/lexical_analysis.html#f-strings

mrzasa
  • 22,895
  • 11
  • 56
  • 94
1

As per the python documentation:

2.4.3. Formatted string literals

New in version 3.6.

A formatted string literal or f-string is a string literal that is prefixed with 'f' or 'F'. These strings may contain replacement fields, which are expressions delimited by curly braces {}. While other string literals always have a constant value, formatted strings are really expressions evaluated at run time.

There are multiple examples in the documentation, so I'll post a few of them and explain:

name = "Fred"
f"He said his name is{name!r}."
# "He said his name is 'Fred'.

Here the ! introduces a conversion field. !r calls repr()

The result is then formatted using the format() protocol. The format specifier is passed to the __format__() method of the expression or conversion result. An empty string is passed when the format specifier is omitted. The formatted result is then included in the final value of the whole string.

Since it's formatted using the format() protocol, the following are other use-cases:

width = 10
precision = 4
value = decimal.Decimal("12.34567")
f"result: {value:{width}.{precision}}"
# result:      12.35

Even datetime objects:

today = datetime(year=2017, month=1, day=27)
f"{today:%B %d, %Y}"
# January 27, 2017

Taking the information above, let's apply it to your code:

f'([{string.punctuation}“”¨«»®´·º½¾¿¡§£₤‘’])'

The line above is inserting string.punctuation into the string at that location.

According to the docs, string.punctuation is:

String of ASCII characters which are considered punctuation characters in the C locale.

If you really want to dig deeper into this: What's the C locale?

The C standard defines the locale as a program-wide property that may be relatively expensive to change. On top of that, some implementation are broken in such a way that frequent locale changes may cause core dumps. This makes the locale somewhat painful to use correctly.

Initially, when a program is started, the locale is the C locale, no matter what the user’s preferred locale is.

ctwheels
  • 21,901
  • 9
  • 42
  • 77
0

This is just Python's new literal string interpolation (f-strings), available as of Python 3.6

Gerrat
  • 28,863
  • 9
  • 73
  • 101