Programmatically determine Field names of Scalding/Cascading Pipe

Question

I'm using Scalding to process records with many (> 22) fields. At the end of the process, I'd like to write out the final Pipe's field names to a file. I know this is possible as Mapper and Reducer logs show this information. I'd like to get this information within the job itself to use it as the basis for a poor-man's schema. If this isn't possible to do, then is there a nice way to use the type-safe Pipes API for large records (i.e., without resorting to arbitrarily nested tuples or case classes)?

score 0 · Answer 1 · edited Jul 19 '18 at 19:55

0

.write(Tsv("filename.tsv"), writeHeader=true)

by making writeHeader = true you tell the .write function to include the schema as well.

edited Jul 19 '18 at 19:55

Ellen Spertus

6,576
9
50
101

answered Feb 24 '15 at 12:33

Saif Niazi

11
1
6

Programmatically determine Field names of Scalding/Cascading Pipe

1 Answers1