9

Using Google Apps Script (http://script.google.com), I know from the docs, how to send, forward, move to trash messages, etc. but I don't find how to remove a file attachement of an email, i.e.:

  1. keep the text content (either in HTML or just plain text would be fine)
  2. keep the original sender, keep the recipient
  3. keep the original message date/hour (important!)
  4. remove the attachment

If it's not possible via the API, is there a way to resend the message to myself, while keeping 1, 2 and 3?


Note: the GmailAttachment class looks interesting and allows to list recipients:

var threads = GmailApp.getInboxThreads(0, 10);
 var msgs = GmailApp.getMessagesForThreads(threads);
 for (var i = 0 ; i < msgs.length; i++) {
   for (var j = 0; j < msgs[i].length; j++) {
     var attachments = msgs[i][j].getAttachments();
     for (var k = 0; k < attachments.length; k++) {
       Logger.log('Message "%s" contains the attachment "%s" (%s bytes)',
                  msgs[i][j].getSubject(), attachments[k].getName(), attachments[k].getSize());
     }
   }
 }

but I don't find how to remove an attachment.

Note: I've already studied many other solutions for doing this, I've already read nearly every article about this (solutions with dedicated web services, with local clients like Thunderbird + Attachment extractor plugin, etc.), but none of them are really really cool. That's why I was looking for a solution to do it manually via Google Apps Script.

Basj
  • 41,386
  • 99
  • 383
  • 673

1 Answers1

8

Looks like messages will have to be re-created-ish:

Messages are immutable: they can only be created and deleted. No message properties can be changed other than the labels applied to a given message.

Using Advanced Gmail Service with the Gmail API insert() you can hack your way around it using: Gmail.Users.Messages.insert(resource, userId)

This advanced service must be enabled before use.

Example: [fill in the EMAIL_ID with an email_id or in whatever way you want to get the email]

function removeAttachments () {
  // Get the `raw` email
  var email = GmailApp.getMessageById("EMAIL_ID").getRawContent();

  // Find the end boundary of html or plain-text email
  var re_html = /(-*\w*)(\r)*(\n)*(?=Content-Type: text\/html;)/.exec(email);
  var re = re_html || /(-*\w*)(\r)*(\n)*(?=Content-Type: text\/plain;)/.exec(email);

  // Find the index of the end of message boundary
  var start = re[1].length + re.index;
  var boundary = email.indexOf(re[1], start);

  // Remove the attachments & Encode the attachment-free RFC 2822 formatted email string
  var base64_encoded_email = Utilities.base64EncodeWebSafe(email.substr(0, boundary));
  // Set the base64Encoded string to the `raw` required property
  var resource = {'raw': base64_encoded_email}

  // Re-insert the email into the user gmail account with the insert time
  /* var response = Gmail.Users.Messages.insert(resource, 'me'); */

  // Re-insert the email with the original date/time 
  var response = Gmail.Users.Messages.insert(resource, 'me', 
                      null, {'internalDateSource': 'dateHeader'});

  Logger.log("The inserted email id is: %s",response.id)
}

This will remove the attachments from the email and re-insert it into your mailbox.

edit/update: New RegExp to work with html&plain-text only emails - should now work on multiple boundary strings

random-parts
  • 2,137
  • 2
  • 13
  • 20
  • Yes, in the `All Mail` folder it will show the email and time when the insert happened - once the message is opened it will basically mirror the original (time/date/sender/etc), minus the attachments. The example is fully functional, the API needs to be [enabled](https://developers.google.com/apps-script/guides/services/advanced) from the script and an `email_id` in the place holder to test it. `me` is a special value used to indicate the authenticated user – random-parts Oct 05 '17 at 19:35
  • It takes the original email in its `raw` format; Removes the attachments information and puts it back in the gmail-box. On insert, a one line `Received: ...` value is added to the first index of the email header. That line holds the new `insert` metadata - where it was received from `gmailapi.google.com with HTTPREST` and date/time. The rest of the message/email is exactly the same as the original (minus the attachments) - Best way to see if it is what you are looking for is to test it. Send an email - wait some minutes (so you can see the timestamps) - run the example on it – random-parts Oct 05 '17 at 20:25
  • I removed all my obsolete comments. Datetime problem 100% solved :) I edited your answer. – Basj Oct 05 '17 at 22:33
  • Last problem: the boundaries regex `--\w+--` isn't accurate for my messages. See https://stackoverflow.com/questions/46596020/how-to-remove-attachment-from-email-raw-content – Basj Oct 05 '17 at 23:04
  • @Basj the updated regex should now work for most, if not all, email boundary strings - including your example – random-parts Oct 07 '17 at 21:16
  • Thanks! Do you think it would work for both sample emails [here](https://stackoverflow.com/questions/46596020/how-to-remove-attachment-from-email-raw-content)? – Basj Oct 09 '17 at 10:58
  • Yes, it was your email example I used. Then tested again with gmail emails – random-parts Oct 09 '17 at 17:50
  • Good idea @random-parts about "Conversation Mode off" but I can't find a way to [loop on messages (not threads) of a specific label](https://stackoverflow.com/questions/46670565/looping-on-messages-not-threads-of-a-specific-gmail-label). – Basj Oct 10 '17 at 15:30
  • My addition in if you need to attach new message to same thread then following change should be applied Instead var resource = {'raw': base64_encoded_email} Use var resource = {'raw': base64_encoded_email,"threadId": "THREAD_ID"} – vukis May 25 '21 at 12:25
  • Got another e-mail example where "Content-type: text/plain <..>" notice word type starts with lowercase t, instead of uppercase T... – vukis May 26 '21 at 11:18
  • @vukis What do you mean about t vs. T? Also can you post an answer with your updated code, ready to use? It would be useful. (See your other comment about tread id) – Basj Aug 19 '22 at 06:13
  • @Basj I think I meant that this case answer was only considering "Content-Type: text/plain <..>" but I found e-mail with "Content-type: text/plain <..>". It's case sensitive ;-) – vukis Aug 20 '22 at 07:49
  • Oh yes I see @vukis, thanks! Do you know how we should update the `re_html` and `re` regex in this answer according to your examples? – Basj Aug 20 '22 at 10:52
  • Sorry it's been a while I worked in this and cannot recall these details – vukis Aug 21 '22 at 18:13
  • @random-parts can we do the full operation from `Gmail` advanced API only (so that it will be easier to translate to Python as well, without using `GmailApp`)? – Basj Aug 22 '22 at 23:18