0

I am working on following mail data in a file.. (data source:infochimps)

Message-ID: <33025919.1075857594206.JavaMail.evans@thyme> Date: Wed, 13 Dec 2000 13:09:00 -0800 (PST) From: john.arnold@enron.com To: slafontaine@globalp.com Subject: re:spreads Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-From: John Arnold X-To: slafontaine@globalp.com @ ENRON X-cc: X-bcc: X-Folder: \John_Arnold_Dec2000\Notes Folders\'sent mail X-Origin: Arnold-J X-FileName: Jarnold.nsf

saw a lot of the bulls sell summer against length in front to mitigate margins/absolute position limits/var. as these guys are taking off the front, they are also buying back summer. el paso large buyer of next winter today taking off spreads. certainly a reason why the spreads were so strong on the way up and such a piece now. really the only one left with any risk premium built in is h/j now. it was trading equivalent of 180 on access, down 40+ from this morning. certainly if we are entering a period of bearish

................]

I am loading above data as:-

A = load '/root/test/enron_mail/maildir/*/*/*' using PigStorage(':') as (f1:chararray,f2:chararray);

but for the message body I am getting separate tuples as message body includes new lines..

how to consolidate last lines into one ? I want below part in single tuple as:

saw a lot of the bulls sell summer against length in front to mitigate margins/absolute position limits/var. as these guys are taking off the front, they are also buying back summer. el paso large buyer of next winter today taking off spreads. certainly a reason why the spreads were so strong on the way up and such a piece now. really the only one left with any risk premium built in is h/j now. it was trading equivalent of 180 on access, down 40+ from this morning. certainly if we are entering a period of bearish

Pk boss
  • 274
  • 1
  • 13
priyanka
  • 305
  • 1
  • 3
  • 18
  • have you considered using REPLACE function (http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/REPLACE.html) to replace newline chars with something else? – Gaurav Phapale Sep 12 '14 at 14:15
  • If each input is a file then you can use some shell script to replace the newlines before feeding it to pig script, or you have to write a UDF to load data. – Vikas Madhusudana Sep 14 '14 at 10:35
  • Ok, suppose I replace newline with some character e.g comma (,). then how does Pig script recognize new record? – priyanka Sep 15 '14 at 05:29

0 Answers0