-1

I'm looking for emails where the title has information on how many Bitcoin I received, but as there's a number in the email title, I want a way to find emails where the number is equal to or greater than that number.

Example... I have an email title like "You received 0.000666703 BTC" but I want to search if the title is this one or has a larger amount of numbers, for example, I want to be able to find this title "You received 0.002719281 BTC", but I don't want to find this "You received 0.000028181 BTC" because the number is smaller. I want to be able to find numbers greater than or equal to the first title, this is my code:

import imaplib                                       
import credentials                                   
import email
from bs4 import BeautifulSoup                                                                             
imap_ssl_host = 'imap.gmail.com'                     
imap_ssl_port = 993                                  
username = "myemail"          
password = "mypass"
server = imaplib.IMAP4_SSL(imap_ssl_host, imap_ssl_port)                                                                                                       
server.login(username, password)                     
server.select('INBOX')                               
typ, data = server.search(None, '(FROM "no-reply@coinbase.com" SUBJECT "You received 0,00066703 BTC" SINCE "24-Sep-2021")')                                         
for num in data[0].split():                           
 typ, data = server.fetch(num,'(RFC822)')             
 msg = email.message_from_bytes(data[0][1])     
 print(msg.get_payload(decode=True))                

The beginning of the subject will always be "You received" but after that there are numbers, and letters that will be the amount of btc and "BTC" as well as my example in the question, but how can I extract only the numbers?

The console output is HTML content, I just want to know if the title (like I explained before) exists so I can do the rest, is there any way to do this more efficiently?

tripleee
  • 175,061
  • 34
  • 275
  • 318
Dkns
  • 45
  • 2
  • 10
  • You obviously only need to fetch the subject if all you care about is the subject. It's not clear from your example whether you want to search for messages with exactly that subject (in which case obviously all the messages returned by the search are matches) or something more generic. – tripleee Jan 10 '22 at 11:16
  • The console output is whatever the payload is, not necessarily HTML (though if all the messages come from the same sender and they always send HTML, that could be the result). – tripleee Jan 10 '22 at 11:17
  • @tripleee I see, well, my question is how to find numbers greater than 0 for example, in the title of the email, you know? – Dkns Jan 10 '22 at 11:18
  • If you are not getting any unrelated messages from this address, just find all messages from this sender and inspect their subjects. If you have stricter criteria, again, please [edit] your question to clarify what the actual question is. – tripleee Jan 10 '22 at 11:18
  • @tripleee I just want to know if the title contains a number greater than 0 eg the HTML content doesn't matter, I just need to know if there is an email like that – Dkns Jan 10 '22 at 11:19
  • But you refuse to clarify whether the suggestions here in the comments are applicable? If you need to check the subject, maybe check if it starts with the static part and then extract the number. – tripleee Jan 10 '22 at 11:27
  • https://www.example-code.com/python/imap_search.asp has some tips for what the IMAP search syntax looks like from Python. – tripleee Jan 10 '22 at 11:34
  • @tripleee The beginning of the subject will always be "You received" but after that there are numbers, and letters that will be the amount of btc and "BTC" as well as my example in the question, but how can I extract only the numbers? – Dkns Jan 10 '22 at 11:34

1 Answers1

1

If you only care about the subject, only fetch the subject.

import imaplib
from email.parser import HeaderParser
from email.policy import default  # use Python >= 3.6 EmailMessage API

... 

parser = HeaderParser(policy=default)

server.select('INBOX')
typ, data = server.search(None, '(FROM "no-reply@coinbase.com" SUBJECT "You received" SINCE "24-Sep-2021")')
if typ == 'ok':
    for num in data[0].split():
       ok, fetched = server.fetch(num, '(BODY.PEEK[HEADER.FIELDS (SUBJECT)])')
       if ok == 'ok':
           subj = parser.parsestr(fetched[0][1].decode('us-ascii'))
           if not subj.startswith('Subject: You received'):
               continue
           try:
               amount = float(subj.split()[2])
           except IndexError, ValueError:
               continue
           if amount > 0.000666703:
               print('Message %i: %s', num, subj)

The Subject: header is a bytes string which at a minimum you have to decode. However, there may also be a MIME wrapping (like maybe Subject: =?UTF-8?B?WW91IHJlY2VpdmVkIDAuMTIzIEJUQw==) which you need to decode using the email.parser.HeaderParser methods or something similar. The interface is a bit messy (you really wish there was a way to pass it bytes so you don't have to separately decode).

The BODY.PEEK method does not modify the message's flags (whereas just BODY would mark the message as read, etc).

Some IMAP servers support more complex search syntax (perhaps even regex) but this should be reasonably portable and robust, I hope.

tripleee
  • 175,061
  • 34
  • 275
  • 318