Questions tagged [scalding]

Scalding is a scala DSL for Cascading, running on Hadoop.

Scalding is a scala DSL for Cascading, running on Hadoop.

See https://github.com/twitter/scalding

181 questions
1
vote
0 answers

Hortonworks Sandbox 2.1 | Split class cascading.tap.hadoop.io.MultiInputSplit not found

I’m executing a scalding job in Hortonworks distribution (HDP 2.1) and it throws this following issue: I tried to locate the cascading jar in Hortonworks but I couldn’t find it. What am I doing wrong here?
Renien
  • 551
  • 4
  • 19
1
vote
2 answers

Scalding flatMap tuple containing list

I have the following input tuple that I'd like to flatMap: (String, List[String]) E.G. Input: ("a", ["1", "2"]) ("b", ["3", "4"]) Needed output: ("a", "1") ("a", "2") ("b", "3") ("b", "4") Is there an elegant way to do this in Scalding/Scala?
Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154
1
vote
1 answer

Convert Seq to Pipe in Scalding

Context: I'm reading in a file where multiple fields are a list of IDs. I need to convert these fields into a Pipe to join them with other Pipes. What I have tried: val otherPipe = pipe .project('fieldIwant) .map { p: Pipe =>…
gstvolvr
  • 650
  • 1
  • 8
  • 17
1
vote
1 answer

Printing to Console in scalding script

I am trying to display some content on the console in a scalding script. When I run the same logic in the scalding shell I get the desired output and when I run the script I get an error: scripttest.scala:4: error: value dump is not a member of…
1
vote
1 answer

mutable.Buffer does not work with Scalding JobTest for Type Safe API

I have almost finished my Scalding project which uses the Type Safe API instead of the Fields API. The last issue that remains for me in overall project set up is the integration tests of the entire Scalding job itself (I have finished unit tests…
PhillipAMann
  • 887
  • 1
  • 10
  • 19
1
vote
0 answers

reading and writing json in Spark and Scalding

I'm trying to write output from a scalding flow in json form, and reading it in Spark. This is working fine, except if the json contains strings with new lines. The output is one json object per line, and newlines in a value on the json is causing…
ashic
  • 6,367
  • 5
  • 33
  • 54
1
vote
1 answer

Scalding write method not found in 0.15.0 version

Just started to implement a simple scalding start-up program. Followed this documentation for references. In this first example it could not resolve write method as a syntax. import com.twitter.scalding._ class WordCountJob(args: Args) extends…
Bruce
  • 8,609
  • 8
  • 54
  • 83
1
vote
0 answers

Kryo/Chill-Scala Serializer - serializing a custom class containing other classes

I want to serialize a Scalding TypedPipe[MyClass] and desrialize it in Spark 1.5.1. I am able to serialize/deserialize a "simple" case class containing only "primitives" such as Booleans and Maps, using kryo and Twitter's Chill for Scala: //In…
Giora Simchoni
  • 3,487
  • 3
  • 34
  • 72
1
vote
1 answer

how to convert Scalding TypedPipe to Iterator

In my Scalding hadoop job, I've got some grouping logic on a pipe, and then I need to process each group: val georecs : TypedPipe[GeoRecord] = getRecords georecs.map( r => (getRegion(r),r) ) .groupBy(_._1) .mapValueStream( xs =>…
nont
  • 9,322
  • 7
  • 62
  • 82
1
vote
1 answer

Sorting output of groupBy in Scalding

I am trying to sort the output of a groupBy statement using Scalding. My dataset looks like this Src Eqid Version Datetime Lat Lon Magnitude Depth NST Region ci 15214001 0 …
gstvolvr
  • 650
  • 1
  • 8
  • 17
1
vote
1 answer

Run Scalding Test Job in Hadoop with JobTest class

I'm not able to run scalding test with JobTest class. Below is the command. How to send command of that? Hadoop jar com.scala-0.0.1-SNAPSHOT.jar com.twitter.scalding.JobTest com.scala.etl --hdfs --input --output facing below problem: Exception in…
Ray
  • 21
  • 1
1
vote
2 answers

Scalding on EMR: Hadoop job fails with NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;

Basically I need to run a scalding job on EMR. The same job runs perfectly fine on local hadoop on my macbook, but fails on Hadoop on EMR. I am trying hard to get help for this issue in the cascading-user and scala-user groups as well, and haven't…
1
vote
1 answer

How to find the exact hadoop jar command which was running my job?

I'm using CDH5.4. I'm running a hadoop job which from command line appears to be ok (when simply running with hadoop jar). However if I run it from yarn It finishes silently with a single mapper and no reducers. I really suspect both 'runs' were…
Jas
  • 14,493
  • 27
  • 97
  • 148
1
vote
1 answer

Can I run spark unit tests within eclipse

Recently we moved from using scalding to spark. I used eclipse and the scala IDE for eclipse to write code and tests. The tests ran fine with twitter's JobTest class. Any class using JobTest would be automatically available to run as a scala unit…
Martin Klosi
  • 3,098
  • 4
  • 32
  • 39
1
vote
1 answer

Outputting a Scalding TypedPipe to a SequenceFile in multiple directories based on one of the fields

I'm using Scalding on Hadoop, I have a large dataset in the form of a TypedPipe I wish to output in chunks based on one of the data fields. For example the data is , and I want the data for each category stored in a…
Giora Simchoni
  • 3,487
  • 3
  • 34
  • 72