I need to take a pipe that has a column of labels with associated values, and pivot that pipe so that there is a column for each label with the correct values in each column. So f example if I have this:
Id Label Value
1 Red 5
1 Blue 6
2 …
I am using scalding 0.12 version with TypedPipe. I want to write the output to csv with headers. How can I add headers with this, I see the option for `writeHeader=true/false" but how do I provide the headers.
I'm trying to read data from HBase, process it and then write to Hive. I'm new to both Scalding and Scala.
I have looked in to SpyGlass for reading from HBase. It works well and I can read the data and then write the it a file.
val data = new…
I know monad is the general concept. What about Execution monad. Is it a general concept or design Patten which can be used outside scalding too.
I have seen new version of scalding is having execution monads.
I'm trying to output a pipe into different directories such that the output of each directory will be bucketed based on some ids.
So in a plain map reduce code I would use the MultipleOutputs class and I would do something like this in the reducer.…
I am writing serialized Thrift records to a file using Elephant Bird's splittable LZO compression. To achieve this I am using their ThriftBlockWriter class. My Scalding job then uses the FixedPathLzoThrift source to process the records. This all…
I wrote my build.sbt like this:
name := """scala-hbase"""
version := "1.0"
scalaVersion := "2.11.2"
//scalaVersion := "2.10.4"
/* HBase dependencies */
resolvers ++= Seq(
"Apache Repo" at…
I am reaidng files on HDFS via scalding, aggregating on some fields, and writing to a tab delimited file via TSV. How can I write out a file that contains the schema of my output file? For example,
UnpackedAvroSource(args("input"))
…
I'm using Scalding with Spyglass to read from/write to HBase.
I'm doing a left outer join of table1 and table2 and write back to table1 after transforming a column.
Both table1 and table2 are declared as Spyglass HBaseSource.
This works fine. But, i…
I have a RichPipe with several fields, let's say:
'sex
'weight
'age
I need to group by 'sex and then get a list of tuples ('weight and 'age). I then want to do a scanLeft operation on the list for each group and get a pipe with 'sex and 'result. I…
I am writing a MapReduce job in Scalding and having difficulties compiling code that looks perfectly legitimate to me.
val persistenceBins = List[Int](1000 * 60 * 60, 2 * 1000 * 60 * 60, 4 * 1000 * 60 * 60)
val persistenceValues =…
I have a RichPipe with 3 fields: name: String, time: Long and value: Int. I need to get the value for a specific name, time pair. How can I do it? I can't figure it out from scalding documentation, as it is very cryptic and can't find any examples…
With Scalding I need to:
Group string fields by first 3 chars
Compare strings in all pairs in every group using edit-distance metric ( http://en.wikipedia.org/wiki/Edit_distance)
Write results in CSV file where record is string; string;…
I have some JSON input that I need to parse and process (this is the first time I am using JSON). My input is as follows:
{"id":"id2","v":2, "d":{"Location":"JPN"})
{"id":"id1","v":1, "d":{"Location":"USA"}}
{"id":"id2","v":1,…