Questions tagged [scalding]

Scalding is a scala DSL for Cascading, running on Hadoop.

Scalding is a scala DSL for Cascading, running on Hadoop.

See https://github.com/twitter/scalding

181 questions
1
vote
2 answers

Scalding: Create list from column in Pipe

I need to take a pipe that has a column of labels with associated values, and pivot that pipe so that there is a column for each label with the correct values in each column. So f example if I have this: Id Label Value 1 Red 5 1 Blue 6 2 …
J Calbreath
  • 2,665
  • 4
  • 22
  • 31
1
vote
2 answers

Adding headers to TypedPipe

I am using scalding 0.12 version with TypedPipe. I want to write the output to csv with headers. How can I add headers with this, I see the option for `writeHeader=true/false" but how do I provide the headers.
nitishagar
  • 9,038
  • 3
  • 28
  • 40
1
vote
1 answer

HBase to Hive example with Scalding

I'm trying to read data from HBase, process it and then write to Hive. I'm new to both Scalding and Scala. I have looked in to SpyGlass for reading from HBase. It works well and I can read the data and then write the it a file. val data = new…
1
vote
1 answer

Execution monad

I know monad is the general concept. What about Execution monad. Is it a general concept or design Patten which can be used outside scalding too. I have seen new version of scalding is having execution monads.
user2230605
  • 2,390
  • 6
  • 27
  • 45
1
vote
1 answer

How to bucket outputs in Scalding

I'm trying to output a pipe into different directories such that the output of each directory will be bucketed based on some ids. So in a plain map reduce code I would use the MultipleOutputs class and I would do something like this in the reducer.…
jeremie
  • 971
  • 9
  • 19
1
vote
1 answer

Is there a Scalding source I can use for lzo-compressed binary data?

I am writing serialized Thrift records to a file using Elephant Bird's splittable LZO compression. To achieve this I am using their ThriftBlockWriter class. My Scalding job then uses the FixedPathLzoThrift source to process the records. This all…
fblundun
  • 987
  • 7
  • 19
1
vote
2 answers

Scalding for Scala 2.11

I wrote my build.sbt like this: name := """scala-hbase""" version := "1.0" scalaVersion := "2.11.2" //scalaVersion := "2.10.4" /* HBase dependencies */ resolvers ++= Seq( "Apache Repo" at…
ans4175
  • 432
  • 1
  • 9
  • 23
1
vote
1 answer

Scalding: Ouptut schema from pipe operation

I am reaidng files on HDFS via scalding, aggregating on some fields, and writing to a tab delimited file via TSV. How can I write out a file that contains the schema of my output file? For example, UnpackedAvroSource(args("input")) …
J Calbreath
  • 2,665
  • 4
  • 22
  • 31
1
vote
1 answer

HBase Get/Scan in a Scalding job

I'm using Scalding with Spyglass to read from/write to HBase. I'm doing a left outer join of table1 and table2 and write back to table1 after transforming a column. Both table1 and table2 are declared as Spyglass HBaseSource. This works fine. But, i…
Sathish
  • 20,660
  • 24
  • 63
  • 71
1
vote
1 answer

groupBy toList element order

I have a RichPipe with several fields, let's say: 'sex 'weight 'age I need to group by 'sex and then get a list of tuples ('weight and 'age). I then want to do a scanLeft operation on the list for each group and get a pipe with 'sex and 'result. I…
Savage Reader
  • 387
  • 1
  • 4
  • 16
1
vote
1 answer

Legitimate code does not compile in Scalding

I am writing a MapReduce job in Scalding and having difficulties compiling code that looks perfectly legitimate to me. val persistenceBins = List[Int](1000 * 60 * 60, 2 * 1000 * 60 * 60, 4 * 1000 * 60 * 60) val persistenceValues =…
Savage Reader
  • 387
  • 1
  • 4
  • 16
1
vote
1 answer

Get a value from RichPipe

I have a RichPipe with 3 fields: name: String, time: Long and value: Int. I need to get the value for a specific name, time pair. How can I do it? I can't figure it out from scalding documentation, as it is very cryptic and can't find any examples…
Savage Reader
  • 387
  • 1
  • 4
  • 16
1
vote
1 answer

Scalding: Compare strings pairwise?

With Scalding I need to: Group string fields by first 3 chars Compare strings in all pairs in every group using edit-distance metric ( http://en.wikipedia.org/wiki/Edit_distance) Write results in CSV file where record is string; string;…
DarqMoth
  • 603
  • 1
  • 13
  • 31
1
vote
0 answers

VerifyError?: method: apply signature: ()Lcascading/pipe/Pipe;) Illegal use of nonvirtual function call

This happens when my code tries to invoke Checkpoint. I've cleaned & rebuilt.
IttayD
  • 28,271
  • 28
  • 124
  • 178
1
vote
1 answer

parsing JSON nested input in Scalding

I have some JSON input that I need to parse and process (this is the first time I am using JSON). My input is as follows: {"id":"id2","v":2, "d":{"Location":"JPN"}) {"id":"id1","v":1, "d":{"Location":"USA"}} {"id":"id2","v":1,…
user2327621
  • 957
  • 3
  • 11
  • 15