0

I'm trying to parse raw email messages over HTTP one at a time that come in MIME/multipart. Here is a chunk of one of the mails, the mail that my code most recently threw this exception on

java.nio.charset.MalformedInputException: Input length = 1

And here is (i think) the relevant chunk of that mail:

Content-Type: multipart/alternative;
 boundary="------------000401070001090809020709"

--------------000401070001090809020709
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit

Is there a Scala library out there for easily handling this type of input? Otherwise is there an easy way to write some code that handles it?

I've been looking at mime4j and this scala code in particular.

As of now, my code just uses scala.io.Source.fromURL to scrape the raw mail as follows:

scrape(scala.io.Source.fromURL(url))

which turns the BufferedSource into a String and splits it:

source.mkString.split("\n\n", 2) 

I've also tried using an implicit codec since scala.io.Source.fromURL can take a codec:

implicit val codec = Codec("UTF-8")
    codec.onMalformedInput(CodingErrorAction.REPLACE)
    codec.onUnmappableCharacter(CodingErrorAction.REPLACE)

but I think I'd need one of these for each charset?

Any help is greatly appreciated.

plamb
  • 5,636
  • 1
  • 18
  • 31
  • MIME parsing is not easy because the grammar is not very well defined. The problem is that I'm under the impression that it has been built block by block. Thus you the informations are very sparse and there is not a one-standard format. I read the RFCs multiple times and writing a compliant MIME parser is not an easy task because of this problem. As a semester project, I started to implement a MIME parser to show how you can interleave multiple parsers with a MIME parser when you have a multipart message. It's really an early implementation and does not follow the RFCs because I only focused.. – Alexis C. Jul 14 '15 at 22:44
  • on the interleaved part. Nevertheless you can maybe tweak/complete the current implementation I started (very incomplete) https://github.com/alexcrt/interleaved-parsers/blob/master/src/main/scala/mime/MIMEParser.scala. I used the parser combinators API. If you have a clear/well defined grammar for your input then it shouldn't be very hard to complete this code to parse the email messages. However, you may want to look at other libraires for MIME parsing. If you're not limited to Scala you may try Flanker (https://github.com/mailgun/flanker) which can parse MIME messages. Hope it helps a bit :) – Alexis C. Jul 14 '15 at 22:47

0 Answers0