0

We have been stuck with this issue for quite some time now.In our project we are trying to parse an email that is written on to a file and get the data into the pojo. It works for most cases but when the email id is too long the mail id goes to the next line due to which the from address is not fetched instead the name is fetched.We are using commons-email-1.4.

The input file containing the emailmessage has

case1:

From: "def, abc [CCC-OT]" <abc.def@test.com> //here it fetches the mail id properly

In the case of longer mail id the file has

case2:

From: "defxacdhf, abc [CCC-OT]" 
<abc.defxacdhf@test.com>// here the mail id jumps to the next line so the from address fetched contains the name

Here is the sample code

ByteArrayInputStream byteArrayStream = new ByteArrayInputStream(FileUtils.getStreamAsByteArray(buffInStream,
                lengthOfFile));
        // MimeMessage message = new MimeMessage(mailSession, byteArrayStream);
        MimeMessageParser mimeParser = new MimeMessageParser(MimeMessageUtils.createMimeMessage(mailSession,
                byteArrayStream));
        MimeMessageParser parsedMessage = mimeParser.parse();

when we try to get the from address

emailData.setFromAddress(parsedMessage.getFrom());

In case1 it returns abc.def@test.com and case2 it returns "defxacdhf, abc [CCC-OT]". Any help here is appreciated.

EDIT the script files reads and write like below.

while read line
        do
            echo "$line" >> /directory/$FILE_NAME
        done
Lakshmi
  • 2,204
  • 3
  • 29
  • 49
  • If I remember the RFC correctly, there needs to be some space before – Jan Dec 09 '15 at 08:15
  • @Jan yes there is space between "def, abc [CCC-OT]" and in normal case in the case of longer email id the id jumps to the next line in the file. – Lakshmi Dec 09 '15 at 08:50
  • There should be white-space (tabs or spaces) in the line beginning with – Jan Dec 09 '15 at 08:51
  • no there no white space on the start of the next line in case2. – Lakshmi Dec 09 '15 at 08:59
  • Then the library works according to rfc. Where do you get the lines from? – Jan Dec 09 '15 at 09:00
  • We write the incoming email to a file through script then read the file for parsing. – Lakshmi Dec 09 '15 at 09:11
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/97381/discussion-between-jan-and-lakshmi). – Jan Dec 09 '15 at 09:13
  • @Jan the script file simply reads the incoming email line by line and using echo command writes it to the file – Lakshmi Dec 09 '15 at 09:53
  • Try adding manually a tab / space in front of – Jan Dec 09 '15 at 09:57
  • @Jan yes we tried adding a space in front of emailid in case2 and it works. not sure where input is broken.. while read line do echo "$line" >> /directory/$FILE_NAME done – Lakshmi Dec 09 '15 at 10:08
  • how does the script get a hold on the messages? Are they used elsewhere as well? Maybe you could consider fetching them with JavaMail in the first part – Jan Dec 09 '15 at 10:15

2 Answers2

1

As discussed:

This is not an error in any of the libraries used but rather an input not conforming to RFC.

Quoting from RFC-822:

3.1.1. LONG HEADER FIELDS

   Each header field can be viewed as a single, logical  line  of
   ASCII  characters,  comprising  a field-name and a field-body.
   For convenience, the field-body  portion  of  this  conceptual
   entity  can be split into a multiple-line representation; this
   is called "folding".  The general rule is that wherever  there
   may  be  linear-white-space  (NOT  simply  LWSP-chars), a CRLF
   immediately followed by AT LEAST one LWSP-char may instead  be
   inserted.  
Jan
  • 13,738
  • 3
  • 30
  • 55
  • Can you check the script code given. I just reads line by line and echo to the file any idea how to fix this input issue? – Lakshmi Dec 09 '15 at 10:15
  • I would - but what language is that? You should consider writing a new Question maybe, so the experts on that particular scripting language would see this as well. – Jan Dec 09 '15 at 10:16
  • And could you specify how that line is read and from where? Is this connecting to a mailbox or a script in sendmail? – Jan Dec 09 '15 at 10:17
  • This script is invoked by sendmail. its a unix script. – Lakshmi Dec 09 '15 at 10:25
  • Oh - not my area of greatest expertise. Could you give background on the why you do that? If you'd just delivered the mail to a local mbox, you could access that with JavaMail and get all the messages as well. Or is this like a copy of every incoming mail? – Jan Dec 09 '15 at 10:44
  • No to tackle a scalability issue wherein some mails were lost if they failed due to server load so we had to get a copy of the mail file. Previously we were using javamail only thanks for all the help will try creating this question in unix tags to get response thanks. – Lakshmi Dec 09 '15 at 10:51
1

I don't understand why you're using a shell while loop to read the data instead of just using cat or something like that, but the problem is in your use of "read". By default, read splits the input line into fields, separated by the field separators specified by the shell IFS environment variable. Leading field separators are ignored, so when you read a line that starts with white space, the white space is ignored.

Change your loop to:

    while IFS= read -r line
    do
        echo "$line" >> /directory/$FILE_NAME
    done

That sets IFS to the empty string before each read, and specifies a "raw" read so that backslash characters aren't special.

But unless you're doing something else in that read loop, it would be much simpler to do just

    cat > /directory/$FILE_NAME
Bill Shannon
  • 29,579
  • 6
  • 38
  • 40
  • thanks ya i found the issue and fixed that from this link http://stackoverflow.com/questions/18055073/how-do-i-preserve-leading-whitespaces-with-echo-on-a-shell-script thanks for the input :) – Lakshmi Dec 10 '15 at 06:41