I'm using Scalding to process records with many (> 22) fields. At the end of the process, I'd like to write out the final Pipe's field names to a file. I know this is possible as Mapper and Reducer logs show this information. I'd like to get this information within the job itself to use it as the basis for a poor-man's schema. If this isn't possible to do, then is there a nice way to use the type-safe Pipes API for large records (i.e., without resorting to arbitrarily nested tuples or case classes)?
Asked
Active
Viewed 453 times
1 Answers
0
.write(Tsv("filename.tsv"), writeHeader=true)
by making writeHeader = true
you tell the .write
function to include the schema as well.

Ellen Spertus
- 6,576
- 9
- 50
- 101

Saif Niazi
- 11
- 1
- 6