I am working on following mail data in a file.. (data source:infochimps)
Message-ID: <33025919.1075857594206.JavaMail.evans@thyme>
Date: Wed, 13 Dec 2000 13:09:00 -0800 (PST)
From: john.arnold@enron.com
To: slafontaine@globalp.com
Subject: re:spreads
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: John Arnold
X-To: slafontaine@globalp.com @ ENRON
X-cc:
X-bcc:
X-Folder: \John_Arnold_Dec2000\Notes Folders\'sent mail
X-Origin: Arnold-J
X-FileName: Jarnold.nsf
saw a lot of the bulls sell summer against length in front to mitigate margins/absolute position limits/var. as these guys are taking off the front, they are also buying back summer. el paso large buyer of next winter today taking off spreads. certainly a reason why the spreads were so strong on the way up and such a piece now. really the only one left with any risk premium built in is h/j now. it was trading equivalent of 180 on access, down 40+ from this morning. certainly if we are entering a period of bearish
................]
I am loading above data as:-
A = load '/root/test/enron_mail/maildir/*/*/*' using PigStorage(':') as (f1:chararray,f2:chararray);
but for the message body I am getting separate tuples as message body includes new lines..
how to consolidate last lines into one ? I want below part in single tuple as:
saw a lot of the bulls sell summer against length in front to mitigate margins/absolute position limits/var. as these guys are taking off the front, they are also buying back summer. el paso large buyer of next winter today taking off spreads. certainly a reason why the spreads were so strong on the way up and such a piece now. really the only one left with any risk premium built in is h/j now. it was trading equivalent of 180 on access, down 40+ from this morning. certainly if we are entering a period of bearish