0

I would like to delete from whatsapp chat.txt file all the dates, username and emoticon. The file looks like this :

10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 
10/4/19, 7:18 PM - user3: example chat
10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 
10/4/19, 7:18 PM - user3: example chat

It is possible to write a script in python that recognizes the username and dates deleting it. Leaving only the chat text? I immagine i should use regex expression and maybe convert all the text to a string.

Please help

meowulf
  • 367
  • 1
  • 5
  • 14

5 Answers5

0

Similar question about regex and Whatsapp logs with python

Regex to match whatsapp chat log

Code from the first answer


^
(?P<datetime>\d{2}/\d{2}/\d{4}[^-]+)\s+-\s+
(?P<name>[^:]+):\s+
(?P<message>[\s\S]+?)
(?=^\d{2}|\Z)

Joe Thor
  • 1,164
  • 1
  • 11
  • 19
0

A super simple way here would be to iterate line by line and split on :. If we can assume that the date, time - username: message will always follow this format, we can grab everything after the second :

text = '''10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 
10/4/19, 7:18 PM - user3: example chat
10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 
10/4/19, 7:18 PM - user3: example chat'''

for message in text.split('\n'):
    print(message.split(':')[2:][0])

Outputs

 example chat
 
 example chat
 example chat
 
 example chat
PacketLoss
  • 5,561
  • 1
  • 9
  • 27
  • with list comprehension : `[message.split(':')[2:][0].strip() for message in text.split('\n')]` – meowulf May 13 '21 at 10:54
  • @user_na As stated in the answer, if the format continues as the pattern shows in the example this will work. I find it highly unlikely that the original pattern `date, time - username: message` will change in the file. – PacketLoss May 13 '21 at 11:18
  • It also crops messages with `\n` inside. – user_na May 13 '21 at 12:35
0

Another way is to build a regexp for that. Emoji regexp taken from here

import re

str_in = """10/4/19, 7:18 PM - user1: example chat 
            10/4/19, 7:18 PM - user2:   
            10/4/19, 7:18 PM - user3: example chat  
            10/4/19, 7:18 PM - user1: example chat  
            10/4/19, 7:18 PM - user2:   
            10/4/19, 7:18 PM - user3: example chat"""

dates_filtered = re.sub(r'(\d+\/\d+\/\d+, \d+:\d+ [AP]M - [ \d\w]+: )', '', str_in)

regrex_pattern = re.compile(pattern = "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           "]+", flags = re.UNICODE)
emoji_filtered = regrex_pattern.sub(r'',dates_filtered)


blank_lines_filtered = re.sub(r'(\n\s*\n)', '\n', emoji_filtered)

print(str_in)
print('---------')
print(dates_filtered)
print('---------')
print(emoji_filtered)
print('---------')
print(blank_lines_filtered)

prints

10/4/19, 7:18 PM - user1: example chat 
10/4/19, 7:18 PM - user2: 
10/4/19, 7:18 PM - user3: example chat 
10/4/19, 7:18 PM - user1: example chat 
10/4/19, 7:18 PM - user2:  
10/4/19, 7:18 PM - user3: example chat
---------
example chat 

example chat 
example chat 
 
example chat
---------
example chat
              
example chat
example chat 

example chat
--------- 
example chat
example chat
example chat 
example chat
--------- 
user_na
  • 2,154
  • 1
  • 16
  • 36
  • This one works pretty well except the fact that sometimes user names are more then one word. For example: 10/4/19, 7:18 PM - Arnold Pence: 10/4/19, 7:18 PM - Mike: example chat 10/4/19, 7:18 PM - Mariah: 10/4/19, 7:18 PM - David sons: example chat. How to match all the cases? – Riccardo Cataldi May 15 '21 at 14:06
  • fixed by changing `[\d\w]+` to `[ \d\w]+` – user_na May 15 '21 at 15:26
0

You can also use list comprehension:

print([ message.split(':')[2:][0] for message in text.split('\n') ])
Synthase
  • 5,849
  • 2
  • 12
  • 34
0

here

`sentence='10/4/19, 7:18 PM - user1: example chat 10/4/19, 7:18 PM - user2: 10/4/19, 7:18 PM - user3: example chat 10/4/19, 7:18 PM - user1: example chat 10/4/19, 7:18 PM - user2: 10/4/19, 7:18 PM - user3: example chat'

chat=re.findall('-\suser\d:\s([a-zA-Z\d]|.*?) \d', sentence)

print(chat)`

output:

['example chat', '', 'example chat', 'example chat', '']
saty035
  • 142
  • 2
  • 7