Highest Voted 'scalding' Questions

1

vote

0 answers

Hortonworks Sandbox 2.1 | Split class cascading.tap.hadoop.io.MultiInputSplit not found

I’m executing a scalding job in Hortonworks distribution (HDP 2.1) and it throws this following issue: I tried to locate the cascading jar in Hortonworks but I couldn’t find it. What am I doing wrong here?

asked Jun 15 '16 at 13:24

Renien

551
4
19

1

vote

2 answers

Scalding flatMap tuple containing list

I have the following input tuple that I'd like to flatMap: (String, List[String]) E.G. Input: ("a", ["1", "2"]) ("b", ["3", "4"]) Needed output: ("a", "1") ("a", "2") ("b", "3") ("b", "4") Is there an elegant way to do this in Scalding/Scala?

scala scalding

asked May 16 '16 at 17:50

Marsellus Wallace

17,991
25
90
154

1

vote

1 answer

Convert Seq to Pipe in Scalding

Context: I'm reading in a file where multiple fields are a list of IDs. I need to convert these fields into a Pipe to join them with other Pipes. What I have tried: val otherPipe = pipe .project('fieldIwant) .map { p: Pipe =>…

scala scalding

asked May 15 '16 at 17:44

gstvolvr

650
1
8
17

1

vote

1 answer

Printing to Console in scalding script

I am trying to display some content on the console in a scalding script. When I run the same logic in the scalding shell I get the desired output and when I run the script I get an error: scripttest.scala:4: error: value dump is not a member of…

scala sbt scalding

asked Feb 23 '16 at 13:45

Rahul Vatsa

74
9

1

vote

1 answer

mutable.Buffer does not work with Scalding JobTest for Type Safe API

I have almost finished my Scalding project which uses the Type Safe API instead of the Fields API. The last issue that remains for me in overall project set up is the integration tests of the entire Scalding job itself (I have finished unit tests…

scala hadoop integration-testing cascading scalding

asked Jan 28 '16 at 18:19

PhillipAMann

887
1
10
19

1

vote

0 answers

reading and writing json in Spark and Scalding

I'm trying to write output from a scalding flow in json form, and reading it in Spark. This is working fine, except if the json contains strings with new lines. The output is one json object per line, and newlines in a value on the json is causing…

json hadoop apache-spark scalding

asked Jan 04 '16 at 16:32

ashic

6,367
5
33
54

1

vote

1 answer

Scalding write method not found in 0.15.0 version

Just started to implement a simple scalding start-up program. Followed this documentation for references. In this first example it could not resolve write method as a syntax. import com.twitter.scalding._ class WordCountJob(args: Args) extends…

scala scalding

asked Dec 15 '15 at 08:35

Bruce

8,609
8
54
83

1

vote

0 answers

Kryo/Chill-Scala Serializer - serializing a custom class containing other classes

I want to serialize a Scalding TypedPipe[MyClass] and desrialize it in Spark 1.5.1. I am able to serialize/deserialize a "simple" case class containing only "primitives" such as Booleans and Maps, using kryo and Twitter's Chill for Scala: //In…

scala serialization apache-spark kryo scalding

asked Oct 21 '15 at 07:10

Giora Simchoni

3,487
3
34
72

1

vote

1 answer

how to convert Scalding TypedPipe to Iterator

In my Scalding hadoop job, I've got some grouping logic on a pipe, and then I need to process each group: val georecs : TypedPipe[GeoRecord] = getRecords georecs.map( r => (getRegion(r),r) ) .groupBy(_._1) .mapValueStream( xs =>…

scala hadoop iterator pipe scalding

asked Sep 10 '15 at 16:25

nont

9,322
7
62
82

1

vote

1 answer

Sorting output of groupBy in Scalding

I am trying to sort the output of a groupBy statement using Scalding. My dataset looks like this Src Eqid Version Datetime Lat Lon Magnitude Depth NST Region ci 15214001 0 …

sorting twitter group-by scalding

asked Aug 31 '15 at 15:34

gstvolvr

650
1
8
17

1

vote

1 answer

Run Scalding Test Job in Hadoop with JobTest class

I'm not able to run scalding test with JobTest class. Below is the command. How to send command of that? Hadoop jar com.scala-0.0.1-SNAPSHOT.jar com.twitter.scalding.JobTest com.scala.etl --hdfs --input --output facing below problem: Exception in…

scala hadoop scalding

asked Aug 12 '15 at 11:36

Ray

21
1

1

vote

2 answers

Scalding on EMR: Hadoop job fails with NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;

Basically I need to run a scalding job on EMR. The same job runs perfectly fine on local hadoop on my macbook, but fails on Hadoop on EMR. I am trying hard to get help for this issue in the cascading-user and scala-user groups as well, and haven't…

scala hadoop amazon-emr scalding

asked Jun 28 '15 at 18:53

user3496234

71
8

1

vote

1 answer

How to find the exact hadoop jar command which was running my job?

I'm using CDH5.4. I'm running a hadoop job which from command line appears to be ok (when simply running with hadoop jar). However if I run it from yarn It finishes silently with a single mapper and no reducers. I really suspect both 'runs' were…

hadoop hadoop-yarn oozie cascading scalding

asked Jun 02 '15 at 10:30

Jas

14,493
27
97
148

1

vote

1 answer

Can I run spark unit tests within eclipse

Recently we moved from using scalding to spark. I used eclipse and the scala IDE for eclipse to write code and tests. The tests ran fine with twitter's JobTest class. Any class using JobTest would be automatically available to run as a scala unit…

eclipse scala junit apache-spark scalding

asked May 21 '15 at 01:42

Martin Klosi

3,098
4
32
39

1

vote

1 answer

Outputting a Scalding TypedPipe to a SequenceFile in multiple directories based on one of the fields

I'm using Scalding on Hadoop, I have a large dataset in the form of a TypedPipe I wish to output in chunks based on one of the data fields. For example the data is , and I want the data for each category stored in a…

hadoop cascading scalding sequencefile

asked May 10 '15 at 10:08

Giora Simchoni

3,487
3
34
72

Questions tagged [scalding]