0

I'm having a requirement where I need to read a MIME object from AWS S3 and parse it. Below is the code I'm using to read the S3 object.

String awsS3Key = "myAWSS3key";
String awsS3Secret = "myAWSS3secret";
String awsS3Region = "us-west-2";
String awsS3Bucket = "my-s3-bucket";

S3Client s3Client = buildS3Client(awsS3Region, awsS3Secret, awsS3Key);

GetObjectRequest request = 
GetObjectRequest.builder().bucket(awsS3Bucket).key("myobject").build();
ResponseBytes<GetObjectResponse> responseBytes = s3Client.getObjectAsBytes(request);
InputStream mailFileInputStream = responseBytes.asInputStream();

Below is the code for building s3 client.

private S3Client buildS3Client(String awsS3Region, String awsS3Secret, String awsS3Key) {
S3ClientBuilder s3ClientBuilder = 
S3Client.builder().credentialsProvider(getAwsCredentialsProvider(awsS3Key, awsS3Secret));
if (StringUtils.isNotBlank(awsS3Region)) {
  s3ClientBuilder.region(Region.of(awsS3Region));
}
return s3ClientBuilder.build();
}

I'm able to fetch MIME object from S3 successfully but with some escape characters. I tried printing the content of mailFileInputStream by below code and it is showing the content with escape characters.

  String contents = IOUtils.toString(mailFileInputStream);
  logger.info("S3 email content: {}",contents);

The problem with this is when I tried converting this mailInputStream to MimeMessage by below code, I get an empty MimeMessage object, may be because inputStream in MimeMessage constructor contain some escape characters.

  Properties props = new Properties();
  Session session = Session.getDefaultInstance(props, null);
  message = new MimeMessage(session, mailFileInputStream);

I tried parsing this MimeMessage by below code to retrieve the content of email but getting null in all the fields.

    MimeMessageParser mimeParser = new MimeMessageParser(message);
    mimeParser.parse();

    String subject = mimeParser.getSubject();
    logger.info("subject: {}", subject);
    String from = mimeParser.getFrom();
    logger.info("from: " + from);
    String plainText = mimeParser.getPlainContent();
    logger.info("plainText: " + plainText);
    String htmString = mimeParser.getHtmlContent();
    logger.info("htmString: " + htmString);

Below is the MIME format expected

    Return-Path: <pratap.gangwani@XXXXX.com>
    Received: from mail-XXX-XXX.google.com (mail-XXX-XXX.google.com [XXXXXXXXXXX])
      by inbound-smtp.us-west-2.amazonaws.com with SMTP id XXXXXXXXX

Below is what I'm getting

    Return-Path: <pratap.gangwani@XXXXX.com>\r\nReceived: from mail-XXX-XXX.google.com (mail- 
    XXX-XXX.google.com [XXXXXXXXX])\r\n by inbound-smtp.us-west-2.amazonaws.com with SMTP 
    id XXXXXXXXX\r\n

I'm not sure whether it is due some escape characters that are coming while getting the object from S3 or something else. Is there a way we can get the object in the exact same format that is stored in S3?

  • \r\n represent carriage return and line feed. By any chance when you stored the data, you replaced CR-LF with \r\n? You may like to replace \r\n with CR-LF and see what happens. – Hitesh A. Bosamiya Feb 03 '22 at 15:12
  • I've not stored the data manually. I've configured AWS SES to store the incoming email to S3 by configuring domain name and MX record. So whenever I send any email to my configured domain, AWS SES capture that and store it to s3 as per the rules and action I configured and when I download the file directly from AWS console, I see a properly formatted MIME file without \r\n. – Pratap Gangwani Feb 03 '22 at 18:38
  • Maybe you can try printing `contents.replaceAll("\\\\r\\\\n", "\r\n")` (yes, there are four '\'). This will help you to know that \r\n are really CR-LF or four characters '\', 'r', '\', & 'n'. – Hitesh A. Bosamiya Feb 04 '22 at 05:45
  • I tried dumping the object to text file. On checking the contents of this text file, no CR-LF / escape characters were found. – Pratap Gangwani Feb 08 '22 at 11:20

0 Answers0