0

I'm trying to automate a process by retrieving and downloading a csv attachment from a Gmail mail message, before that running a process on that csv file. I'm finding a lot of the documentation is either difficult to follow or is not applicable to the more recent versions of Python.

I'using the following code to retrieve the mail Ids. This currently compiles with no errors in PyCharm and does return some results when executed in Terminal using iPython:

#!/usr/bin/env python3

import pickle
import os.path
import email
import base64
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from apiclient import errors



# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
store_dir = os.getcwd()

# USE MESSAGE ID TO GET EMAIL
def get_message(service, user_id, msg_id):

    try:
        message = service.users().messages().get(userId=user_id, id=msg_id, format='raw').execute()

        msg_raw = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))

        msg_str = email.message_from_bytes(msg_raw)

        content_types = msg_str.get_content_maintypes()

        if content_types == 'multipart':
            #part1 is plain text, part2 is HTML
            part1, part2 = msg_str.get_payload()
            print("This is the message body: ")
            print(part1.get_payload())
            return part1.get_payload()
        else:
            return msg_str.get_payload()

    except (errors.HttpError, error):
        print("An error occurred: %s") % error



# SEARCH FOR MESSAGE ID
def search_mail(service, user_id, search_string):
    try:

        search_id = service.users().messages().list(userId=user_id, maxResults=20,labelIds = ['INBOX'], q=search_string).execute()

        num_results = search_id['resultSizeEstimate']

        result_list()
        if number_results>0:
           message_ids = search_id['messages']

        for ids in message_ids:
            num_results.append(ids['id'])

            return result_list
        else:
            print('There were 0 results for your search string. Returns empty')
            return ""

    except (errors.HttpError, error):
        print("An error occurred: %s") % error

def get_service():
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)

    return service


def GetAttachments(service, user_id, msg_id): #store_dir):
  """Get and store attachment from Message with given id.

  Args:
    service: Authorized Gmail API service instance.
    user_id: User's email address. The special value "me"
    can be used to indicate the authenticated user.
    msg_id: ID of Message containing attachment.
    store_dir: The directory used to store attachments.
  """
  try:
      message = service.users().messages().get(userId=user_id, id=msg_id).execute()

      for part in message['payload']['parts']:
          if part['filename']:
              attachment = service.users().messages().attachments().get(userId='me', messageId=message['id'],
                                                                        id=part['body']['attachmentId']).execute()
              file_data = base64.urlsafe_b64decode(attachment['data'].encode('UTF-8'))

              path = ''.join([store_dir, part['filename']])

              f = open(path, 'wb')
              f.write(file_data)
              f.close()


  except errors.HttpError as error:
      print(f'An error occurred: {error}')

My issues:

  1. It's not clear to me where I can enter my search criteria/string. The email arrives everyday and the filename is always the same, so I need of way of specifying the filename of the required attachment such that only the applicable message is retrieved. At present, I'm just retrieving that last 20 messages which is not prescriptive enough.

  2. I'm using store_dir = os.getcwd() to manually declare the destination file path for the attachment (which is simply the working directory of this project), however, unsurprisingly, the file is not currently being saved in the specified location.

  3. I need for the file to imported, readable and processed by a pre-existing script which is saved in another .py file. This process was written in a jupyter notebook where I was able to verify that it works when the csv is imported manually.

Any guidance appreciated.

jimiclapton
  • 775
  • 3
  • 14
  • 42

1 Answers1

0

Considering that:

  • The name of the attached file is always the same.
  • The email is received every day.

Instead of listing the last 20 messages, you could use the search query (q) to find the desired message, using the following search operators, which work the same way as with Gmail UI:

search_query = "newer_than:1d filename:{your-attachment-name}"
search_id = service.users().messages().list(userId=user_id, maxResults=20,labelIds = ['INBOX'], q=search_query).execute()

About downloading the attachment, I'd just follow this.

Reference:

Iamblichus
  • 18,540
  • 2
  • 11
  • 27