Does anyone know how to compare consecutive records in scalding when creating a schema. I am looking at tutorial 6 and suppose that I want to print the age of the person if data in record #2 is greater than record #1 (for all records)
for example:
R1: John 30
R2: Kim 55
R3: Mark 20
if Rn.age > R(n-1).age the output ... which will result to R2: Kim 55
EDIT: Looking at the code I just realized it is a Scala enumeration, so my question is how to compare records in scala enumeration ?
class Tutorial6(args : Args) extends Job(args) {
/** When a data set has a large number of fields, and we want to specify those fields conveniently
in code, we can use, for example, a Tuple of Symbols (as most of the other tutorials show), or a List of Symbols.
Note that Tuples can only be used if the number of fields is at most 22, since Scala Tuples cannot have more
than 22 elements. Another alternative is to use Enumerations, which we show here **/
object Schema extends Enumeration {
val first, last, phone, age, country = Value // arbitrary number of fields
}
import Schema._
Csv("tutorial/data/phones.txt", separator = " ", fields = Schema)
.read
.project(first,age)
.write(Tsv("tutorial/data/output6.tsv"))
}