-2

I have the following text.

"\*hello* * . [ }"

It should be escaped like this:

"\*hello\\* \* \\. \\[ \\}"

How to do this with python regex?

Every special character (the special characters are: _, *, [, ], (, ), ~, `, >, #, +, -, =, |, {, }, ., ! must be escaped with the preceding character \.

I tried it with this but then every character is escaped:

escape_chars = r'_*[]()~`>#+-=|{}.!'
return re.sub(f'([{re.escape(escape_chars)}])', r'\\\1', text)

Then the text is unformatted like this:

\*hello\* \* \. \[ \}

But it should be like this:

**hello** \* \. \[ \}

Some examples:

At \* \* \* only the middle one should be escaped At \{ \{ \} only the middle one should be escaped

I need this for tex formatting: https://core.telegram.org/bots/api#markdownv2-style

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
a14stoner
  • 55
  • 1
  • 9
  • I have tried to fix the formatting of your post, but it isn't always clear what you meant to type. Please review my edits and fix any mistakes. – ChrisGPT was on strike Aug 08 '21 at 15:40
  • I'm still confused by your escapes, but why is `*hello*` supposed to turn into `**hello**`? That doesn't make any sense and has nothing to do with escaping. And why in the last two examples should "only the middle one" be escaped? Earlier you said that "every special character" must be escaped. – ChrisGPT was on strike Aug 08 '21 at 15:42

1 Answers1

4

Since you tagged python-telegram-bot, I'm gonna point you to the escape_markdown helper function. the source code for this is here

Maybe this helps you. However, I have to agree with Chris: It's not clear to me what you actually want to achieve.

EDIT:

The use case seems to be that users should be allowed to set some kinds of template messages, which can have dynamic input. OP did not (yet) explain how exactly those templates look like, so I'll just make up an example. Let's say the user wants to specify a welcome message of the format

Hello_there, {username}!

Where Hello_there is italic and {username} is replaced with the corresponding string at runtime and should be displayed bold, including the !.

I see two ways to approach this.

  1. The user sends the message as formatted text (i.e. the Bot receives a message "Hellow_there, {username}!"). In this case, one can store the template by simply storing update.effective_message.text_markdown(_v2)/text_html. See Message.text_html. Then at runtime, all you need to to is send_message(template.format(username=escaped_username), parse_mode=...). Note that here escaped_username is a string containing the username with special characters escaped. This can be achieved with either escape_markdown for markdown formatting or for HTML formatting with html.escape from the std lib

  2. The user sends the text with markup characters. Sticking to Markdown formatting for the example, the bot would receive a message saying _Hello_there_, *{username}!*. Now to convert this to a template, you'd have to somehow escape the relevant characters. In this case this should be _Hello\_there_,*escaped_username\!* at runtime. In this scenario I don't see a safe way to decide what to escape and what not to. While you can do some regexing to e.g. convert *{username}!* to *{username}\!*, how would you know if the user wants "Hello there_" or "Hello_there"?

I therefore highly recommend the first approach.


Disclaimer: I'm currently the maintainer of python-telegram-bot

CallMeStag
  • 5,467
  • 1
  • 7
  • 22
  • I have a text: "Hello Name, welcome to the group." The "Hello Name" should be bold so I have to add \* -> \*Hello Name\* ... when testing this only with \*Hello Name\* it works and the text is bold. But when there is a single character like _, *, [, ], (, ), ~, `, >, #, +, -, =, |, {, }, ., ! in the text (e.g. my example above) I get the following error: Can't parse entities: character '.' is reserved and must be escaped with the preceding '\' – a14stoner Aug 08 '21 at 19:52
  • I have to add, that every message can be different.. – a14stoner Aug 08 '21 at 21:30
  • Then you need to escape those characters. This can be done e.g. with `escape_markdown`. Note that you must escape only the characters that are not part of the markup. E.g. to write `Hello foo_bar` in bold, you'd have to escape only the `_`, not the `*` s. This can be achieved e.g. with `f'*Hello {espace_markdown("foo_bar", version=2)}*'`. – CallMeStag Aug 09 '21 at 06:42
  • Thanks @CallMeStag: and what if every formattation should be possible ? \*bold \_italic \__underline__ ~strikethrough~ \*bold _italic bold ~italic bold strikethrough~ \__underline italic bold___ bold* \[inline URL](http://www.example.com/) \[inline mention of a user](tg://user?id=123456789) \`inline fixed-width code` – a14stoner Aug 09 '21 at 10:04
  • I don't see what your question is here. you just need to escape all characters that are not part of the makup and that TG tells you to escape. `escape_markdown` is just a shortcut for that. – CallMeStag Aug 09 '21 at 10:32
  • I have a telegrambot that manages groups. Anyone can use this bot. Set welcome messages, goodbye messages and so on. When I use escape_markup for these welcome and goodbye messages with entity_type= pre or code or text_link it does not escape a '.' for example. If some customer has a '.' in the welcome message --> Can't parse entities: character '.' is reserved and must be escaped with the preceding '\'. If i set no entity_type every special character is escaped. Then the MarkdownV2 parse_mode does not format anything. Sorry for not describing my problem well.. – a14stoner Aug 09 '21 at 11:21
  • I could use the html parse_mode but the markdownv2 parse_mode is much sweeter imo.. – a14stoner Aug 09 '21 at 11:24
  • Thanks that worked. Cant upvote .. havent enough reputations – a14stoner Aug 12 '21 at 11:48
  • You can mark my answer as accepted, though ;) – CallMeStag Aug 12 '21 at 12:53
  • Maybe you can upvote my question then i can give you an upvote . If obe upvote is enought to get from 9 reps to 15.. – a14stoner Aug 17 '21 at 16:37