1

I'm writing a little python script using the Telethon library. One of the functions that I'm writing return the word usage frequency of a specific user. The word usage frequency is contained in a String object that is returned to the user in Telegram.

The code for this function is shown below.

@bot.on(events.NewMessage(pattern='/wordsUsage'))
async def start(event):
    """Returns the word usage frequency of a specific user."""
    messagesHistory = await client.get_messages(chat_id, None, from_user=event.message.from_id.user_id)
    messagesHistory = [i.message for i in messagesHistory if type(i.message) is str]
    # I know this line is ugly, let me alone c:
    listWords = ' '.join(messagesHistory).replace('\n', ' ').split(' ')
    countWord = dict()
    for word in listWords:
        if word not in countWord.keys():
            countWord[word] = 1
        else:
            countWord[word] += 1
    countWord = sorted(countWord.items(), key=lambda item: item[1])
    await event.respond(pprint.pformat(countWord, indent=4))
    raise events.StopPropagation

When this function is called I get the following error :

telethon.errors.rpcerrorlist.MessageEmptyError: Empty or invalid UTF-8 message was sent (caused by SendMessageRequest)

I don't understand where I'm wrong since Python3 Strings are UTF-8 Strings.

beepmep
  • 11
  • 4
  • What version are you using? What is the result of `pprint.pformat`? Is it a non-empty string? Are you sure the error comes from this function? – Lonami Sep 24 '21 at 14:26
  • @Lonami I'm using the 1.23.0 version. This is a sample of the result of pprint.pformat : `('une', 31), ('ca', 33), ('̬̝̮̱̫̖̓͋͠Ⓘ̶̪̬͔̰̇̒Ⓝ̶̤͕̥͎͓̐ͩ͛̚☠️☠️Ⓓ̲̠̺͉̯͓͑ͧ̾͜Ⓤ͍̺̭̺̹̾̏̏͂́', 33), ('̜̲͍̼̭̈́̓̂̀ͅⓅ̶͚̖̘̫̖͙̮̏Ⓐͨ', 34), ('que', 35), ('et', 35),` – beepmep Sep 24 '21 at 14:35
  • The emoji seem broken, so that's probably why. You might want to remove bad characters from the output before using it in `respond` (for instance, encoding to ASCII and back). – Lonami Sep 24 '21 at 14:37
  • Yes I saw that but since Python3 strings are UTF-8 string I thought that it would be ok. I'll sanitize the output :). Thanks for your help. – beepmep Sep 24 '21 at 14:39
  • Python 3 strings are UTF-8, but nothing prevents them from having invalid UTF-8 (in contrast to, say, Rust, which would panic at runtime). – Lonami Sep 24 '21 at 14:40
  • @Lonami Thanks for the answer you were right ! :) – beepmep Oct 12 '21 at 12:26

0 Answers0