A very, very dirty way to do that could be:
- design a simple Perl script (or Python script, or
sed
command line) that takes source records from stdin, breaks them into N logical records, and push these to stdout
- tell Hive to use that script/command as a custom Map step, using the TRANSFORM syntax -- the manual is there but it's very cryptic, you'd better Google for some examples such as this or that or whatever
Caveat: this "streaming" pattern is rather slow, because of the necessary Serialization / Deserialisation to plain text. But once you have a working examople, the development cost is minimal.
Additional caveat: of course, if source records must be processed in order -- because the logical records can spill on the next row, for example -- then you have a big problem, because Hadoop may split the source file arbitrarily and feed the splits to different Mappers. And you have no criteria for a DISTRIBUTE BY clause in your example. Then, a very-very-very dirty trick would be to compress the source file with GZIP so that it is de facto un-splittable.