4

I'm using Scalding to process records with many (> 22) fields. At the end of the process, I'd like to write out the final Pipe's field names to a file. I know this is possible as Mapper and Reducer logs show this information. I'd like to get this information within the job itself to use it as the basis for a poor-man's schema. If this isn't possible to do, then is there a nice way to use the type-safe Pipes API for large records (i.e., without resorting to arbitrarily nested tuples or case classes)?

Ben Sidhom
  • 1,548
  • 16
  • 25

1 Answers1

0
.write(Tsv("filename.tsv"), writeHeader=true)

by making writeHeader = true you tell the .write function to include the schema as well.

Ellen Spertus
  • 6,576
  • 9
  • 50
  • 101
Saif Niazi
  • 11
  • 1
  • 6