I have records like this:
Name: Alan Kay
Email: Alan.Kay@url.com
Date: 09-09-2013
Name: Marvin Minsky
Email: Marvin.Minsky@url.com
City: Boston, MA
Date: 09-10-2013
Name: Alan Turing
City: New York City, NY
Date: 09-10-2013
They're multiline but not always of the same number of lines, and they're usually separated by a newline. How would I convert it to the output below?
Alan Kay|Alan.Kay@url.com||09-09-2013
Marvin Minsky|Marvin.Minsky@url.com|Boston,MA|09-10-2013
Alan Turing||New York City, NY|09-10-2013
Apache Pig treats each line as a record, so it's not suited for this task. I'm aware of this blog post on processing multi-line records, but I'd prefer not to delve into Java if there's a simpler solution. Is there a way to solve this using Hadoop Streaming (or a framework like mrjob)?