1

I'm looking to write the perfect code for translating messages formatted for WhatsApp into messages formatted for Telegram.

Whatsapp uses * and _ for bold and italic (but if you dive into it you'll discover that the parsing rules are quite elaborate): https://faq.whatsapp.com/general/chats/how-to-format-your-messages/?lang=en

Telegram uses HTML tags: https://sendpulse.com/knowledge-base/chatbot/format-text

I've used this answer to create the following code. It works but fails to convert correctly in some cases i.e. when there is a solitary *.

function whatsapp2Telegram(WAtxt) {
  var TelegramText = WAtxt

  const htmlFormat = [{
      symbol: '*',
      tag: 'b'
    },
    {
      symbol: '_',
      tag: 'i'
    },
    {
      symbol: '~',
      tag: 'pre'
    },
    {
      symbol: '```',
      tag: 'code'
    },
  ];

  htmlFormat.forEach(({
    symbol,
    tag
  }) => {
    if (!TelegramText) return;

    const regex = new RegExp(`\\${symbol}([^${symbol}]*)\\${symbol}`, 'gm');
    const match = TelegramText.match(regex);
    if (!match) return;

    match.forEach(m => {
      let formatted = m;
      for (let i = 0; i < 2; i++) {
        formatted = formatted.replace(symbol, `<${i > 0 ? '/' : ''}${tag}>`);
      }
      TelegramText = TelegramText.replace(m, formatted);
    });
  });

  return TelegramText

}
bad_coder
  • 11,289
  • 20
  • 44
  • 72
  • I do believe markdown, as well as HTML, belongs to languages described by [context-sensitive](https://en.wikipedia.org/wiki/Context-sensitive_grammar) grammars, thus it can't be parsed by _pure_ regular expressions. Though, _lookahead_ & _lookbehind_ could be used to tackle the problem. – xamgore Jul 25 '21 at 09:18

0 Answers0