1

I'm trying to export a csv-file from the Gmail-API via the Users.messages.attachments: get method (1) and convert it to a pandas data-frame. I already have the ID of the attachment and the ID of the message and everything is working fine. The documentation says that the body data of the attachment is "part as a base64url encoded string" (2) and now i would like to convert the the csv-file to a pandas data-frame. Following this post I tried the following:

[...]

#get the attachment

file = service.users().messages().attachments().get(userId='me', messageId=message_id, id=attachmentId).execute()

#convert the file to a pandas data-frame.

data = file['data']
str_csv = base64.urlsafe_b64decode(data)
df = pd.read_csv(StringIO(str_csv))

This is what print(data) looks like :

__5CAGUAcgBpAGMAaAB0ACAAegB1ACAAQQB1AGsAdABpAG8AbgBzAGQAYQB0AGUAbgAgAGYA_AByACAASwBhAG0AcABhAGcAbgBlAG4ACgAxAC4AIABKAGEAbgB1AGEAcgAgADIAMAAyADAAIAAtACAAMwAxAC4AIABKAGEAbgB1AGEAcgAgADIAMAAyADAACgBHAGUAcgDkAHQACQBEAG8AbQBhAGkAbgAgAGQAZQByACAAYQBuAGcAZQB6AGUAaQBnAHQAZQBuACAAVQBSAEwACQBBAG4AdABlAGkAbAAgAGEAbgAgAG0A9gBnAGwAaQBjAGgAZQBuACAASQBtAHAAcgBlAHMAcwBpAG8AbgBlAG4ACQDcAGIAZQByAHMAYwBoAG4AZQBpAGQAdQBuAGcAcwByAGEAdABlAAkAUgBhAHQAZQAgAGQAZQBy and so on...

Unfortunately I get the following error-message:

error-message

Does anyone have an idea how I can fix this or can explain why it isn't working?

(1) see https://developers.google.com/gmail/api/v1/reference/users/messages/attachments/get

(2) see https://developers.google.com/gmail/api/v1/reference/users/messages/attachments

Moritz Wolff
  • 436
  • 1
  • 7
  • 16

1 Answers1

2

The immediate cause of the error is that base64.urlsafe_b64decode returns a byte string. You must first decode it to have a true string that you will pass to StringIO:

str_csv = base64.urlsafe_b64decode(data).decode('UTF16')

But beware: the encoded string you show gives when decoded:

'Bericht zu Auktionsdaten für Kampagnen\n1. Januar 2020 - 31. Januar 2020\nGerät\tDomain der angezeigten URL\tAnteil an möglichen Impressionen\tÜberschneidungsrate\tRate der'

that is:

  • 2 headings lines
  • 1 line containing tabs

I think that it will need further processing before feeding read_csv (at least skip some lines and set the separator...)

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • If I process str_csv like you suggested and pass it to StringIO and name that object str_csv_IO then the following happens: python throws an error if I pass str_csv_IO to pd.read_csv. Why is that? I'm working the first time with the base64 and StringIO modules, so sorry for perhaps too obvious questions. – Moritz Wolff Feb 25 '20 at 17:38
  • @MoritzWolff: The beginning of the data that you have showed is a hint that the attachement is not a plain clean and neat CSV file. You should print `str_csv` (at least 10 lines) and tell how you would like it to feed your dataframe. I have not enough information to guess more... – Serge Ballesta Feb 25 '20 at 18:23