1

In a single line: Given a string, how can we get to a raw string representation of it?


I am generating a file, in which I have regexes, like '[ \\n\\t\\r\\f\\v]' which I wish to render as raw strings. How can this be achieved?

P.S: I actually intend on representing strings with double quotes too, so a string '\'' is rendered as "'". I need help with that too..

By "raw string" I mean the type of strings we use often for regexps:

>>> r"[ \r\n\f\v\t]"
'[ \\r\\n\\f\\v\\t]'
# assuming to_raw is the function
>>> print to_raw(r"[ \r\n\f\v\t]")
r'[ \r\n\f\v\t]'
>>> print to_raw("\\\\")
r'\\'
>>> print to_raw("'")
r"'"
BenMorel
  • 34,448
  • 50
  • 182
  • 322
pradyunsg
  • 18,287
  • 11
  • 43
  • 96
  • Any attempt at it yet? – Jerry Aug 29 '13 at 06:07
  • @Jerry Yes.. Tried replacing `repr`, but doesn't render unicode well way(prints the unicode escape, undesired). Same in double quotes' case. – pradyunsg Aug 29 '13 at 06:09
  • 1. `b'\ra\\w'.decode('raw-unicode-escape')` => `'\ra\\w'` 2. What does "raw string" mean? – User Aug 29 '13 at 09:35
  • @User 1. No!!! 2. Check update. – pradyunsg Aug 29 '13 at 10:56
  • Please provide some sample input and sample output as it is still not pretty clear what you are asking for ? – Ibrahim Najjar Aug 29 '13 at 11:01
  • Do you have to do this with a regex or you can use something else ? – Ibrahim Najjar Aug 29 '13 at 11:16
  • *Why* do you want to render them as raw strings? – Ignacio Vazquez-Abrams Aug 29 '13 at 11:17
  • @IgnacioVazquez-Abrams Well, for readability, as those regexes are pretty long, and also because it would be a good thing to try, but I gave up.. And so this question. – pradyunsg Aug 29 '13 at 11:24
  • @Sniffer Anything will do really.. – pradyunsg Aug 29 '13 at 11:24
  • Not every string can be represented as a raw string. For example, `'a\\'` can't be represented because raw strings can't end in an odd number of backslashes for some reason. Other problematic strings are those containing control characters, or strings containing all varieties of quote characters. – interjay Aug 29 '13 at 11:28
  • 1
    Doing this with a regular expression can be a bit hard, instead take a look at this [question](http://stackoverflow.com/q/13778571/439667) which solves your problem. – Ibrahim Najjar Aug 29 '13 at 11:28
  • @interjay I know that. But the strings that I am trying to render are from a valid python "raw" string. – pradyunsg Aug 29 '13 at 11:33
  • @Sniffer The only problem with that is that it is not escaping the quotes, so if a string has single (or double) quotes it breaks... – pradyunsg Sep 05 '13 at 17:41
  • Sorry because I don't use Python and I didn't knew of the problem but doing this with a regular expression *- at least just one -* could be pretty hard. – Ibrahim Najjar Sep 05 '13 at 19:59

2 Answers2

3

If your string came from a raw string, it appeared in your source code inside r'', r"", r'''''', or r"""""", with no other special escaping. One of those will work:

import ast

def rawstringify(s):
    for format in ["r'{}'", 'r"{}"', "r'''{}'''", 'r"""{}"""']:
        rawstring = format.format(s)
        try:
            reparsed = ast.literal_eval(rawstring)
            if reparsed == s:
                return rawstring
        except SyntaxError:
            pass
    raise ValueError('rawstringify received an invalid raw string')

Demo:

>>> print rawstringify(r'')
r''
>>> print rawstringify(r'\n\r\b\t')
r'\n\r\b\t'
>>> print rawstringify(r"'")
r"'"
>>> print rawstringify(r'\
... ')
r'\
'
>>> print rawstringify(r'''asdf
... ''')
r'''asdf
'''
>>> print rawstringify('\\')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 10, in rawstringify
ValueError: rawstringify received an invalid raw string
wim
  • 338,267
  • 99
  • 616
  • 750
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • @wim: You seem to have applied an extra layer of `repr` there. [The output is fine in my tests.](https://ideone.com/TJEsGb) – user2357112 Oct 11 '18 at 04:22
  • Pretty clever actually. It could be modified to handle unicode better, e.g. `ur'\U0001f4a9'` is a valid raw string but will crash this code. – wim Oct 11 '18 at 04:54
  • @wim: `ur` strings are really weird. They're only semi-raw in Python 2; Unicode escapes still get processed. `ur` isn't allowed in Python 3 at all. I don't remember whether I made a conscious decision to exclude `ur` strings five years ago, but looking at it now, I don't think I want to try to add support. – user2357112 Oct 11 '18 at 05:14
1

In a single line: Given a string, how can we get to a raw string representation of it?

A raw string simply escapes any special sequences:

>>> s
'This is some string \n that is not \raw'
>>> print(s)
This is some string
awhat is not
>>> i = s.encode('string_escape')
>>> i
'This is some string \\n that is not \\raw'
>>> print(i)
This is some string \n that is not \raw
>>> i == r'This is some string \n that is not \raw'
True
>>> i == s
False
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
  • What I want to do is from a string like `i` get to `r'This is some string \n that is not \raw'`. – pradyunsg Aug 29 '13 at 11:51
  • it is the same exact string as what you typed in. You can see from the comparison that I did. What's the actual problem you have? – Burhan Khalid Aug 29 '13 at 11:56
  • What I what to do is get from a string like `"I'm \\raw"` to `r"I'm \raw"`. So if we have a regex, it is when printed is rendered raw.. I shall respond to the comment tomorrow in more detail, It's late right now... (I've to go to School) :-) ... – pradyunsg Sep 05 '13 at 17:35