Highest Voted 'scalding' Questions

0

votes

1 answer

Scalding (older versions) counters based on cascading

In older versions of scalding there were still no counters introduced in its API. Hadoop Counters In Scalding suggests how to fallback to cascading counters in scalding def addCounter(pipe : Pipe, group : String, counter : String) = { …

scala hadoop scalding

asked Jan 21 '15 at 10:47

Jas

14,493
27
97
148

0

votes

1 answer

Scala/Scalding: Pivoting data

I have a dataset which is the output of a pipe in scalding that looks like this: 'Var1, 'Var2, 'Var3, 'Var4 = a,x,1,2 a,y,3,4 b,x,1,2 b,y,3,4 I'm trying to turn it into something like: 'Var1, 'Var3x, 'Var4x, 'Var3y, 'Var4y…

scala scalding

asked Nov 22 '14 at 01:51

J Calbreath

2,665
4
22
31

0

votes

1 answer

Reading ctrl a delimiter in scalding

I'm trying to read a ctrl-a delimited file in scalding. I'm getting an error that says it found the wrong number of fields (expecting 166, found 142) and then it displays the line it is trying to read. For some reason, it does not read the…

scala scalding

asked Nov 20 '14 at 17:45

J Calbreath

2,665
4
22
31

0

votes

1 answer

How do I log to file in Scalding?

In my Scalding map reduce code, I want to log out certain steps that are happening so that I can debug the map-reduce jobs if something goes wrong. How can I add logging to my scalding job? E.g. import com.twitter.scalding._ class WordCountJob(args:…

hadoop mapreduce scalding

asked Nov 18 '14 at 05:59

jcm

5,499
11
49
78

0

votes

1 answer

Hadoop-Cascading: Partial directory source tap

My data have structure like this: +data |-2014080700_00.txt |-2014080700_01.txt |-2014080701_00.txt |- ... |-2014080723_00.txt |-2014080800_00.txt |- ... |-2014090800_00.txt I know I can use all the file inside data directory with Tap like…

java hadoop cascading scalding

asked Sep 30 '14 at 10:08

dieend

2,231
1
24
29

0

votes

1 answer

Scalding overriding args in subclass

I have two Scalding jobs, where one inherits from the other. Something like this class BaseJob(args : Args) extends Job(args) { val verbose = args.boolean("verbose") if(verbose){ // do stuff }else{ // do other stuff } } class…

scala inheritance arguments scalding

asked Sep 26 '14 at 07:10

arno_v

18,410
3
29
34

0

votes

1 answer

How is mapTo more efficient than map in Scalding

The Scalding reference on Github (https://github.com/twitter/scalding/wiki/Fields-based-API-Reference#map-functions) says the following: MapTo is equivalent to mapping and then projecting to the new fields, but is more efficient. Thus, the …

scala scalding

asked Sep 05 '14 at 20:57

Chidu

330
2
10

0

votes

1 answer

Scalding, can't use more than one trait in Job

I have a scalding job. I've create two traits A, B each trait has companion object A, B with implict wrap for trait and Pipe. Job compiles successfully, when I use only one trait. When I import both traits, compilation fails. It says that all…

scala traits scalding

asked Aug 22 '14 at 13:19

Capacytron

3,425
6
47
80

0

votes

0 answers

Loading extremely long lines with TextLine in Cascading

I'm using TextLine in Cascading to load files with very large lines in Cascading. The lines are very long - around 30Mb on average, some much longer. When I run the job locally to test it it runs fine, but when I run it on the cluster it fails after…

hadoop mapreduce cascading scalding mapr

asked Aug 14 '14 at 18:15

Savage Reader

387
1
4
16

0

votes

0 answers

Scalding: How to reduce in-memory computations on lists?

With Scalding I try to find edit-distances between pairs of similar strings. All in all I have 10 000 000 strings in a CSV file. To reduce computations I use the following algorithm: Split all strings in groups by using first three chars as a…

algorithm scala hadoop scalding

asked Jul 31 '14 at 16:07

DarqMoth

603
1
13
31

0

votes

0 answers

Selecting max value when joining RichPipes

I have a list of RichPipes with the following fields: name: String joinTime: Long value: Int I want to join them sequentially using reduce. When joining the RichPipes I only want to retain one field, value, and I want it to contain the max value…

scala hadoop scalding

asked Jul 17 '14 at 13:54

Savage Reader

387
1
4
16

0

votes

1 answer

How Scalding DSL translates into regular Scala code?

Please help to find out how Scalding DSL translates into regular Scala code. https://github.com/twitter/scalding/wiki/Fields-based-API-Reference#sortBy For example: val fasterBirds = birds.map('speed -> 'doubledSpeed) { speed : Int => speed * 2…

scala dsl scalding

asked Jul 12 '14 at 11:10

DarqMoth

603
1
13
31

0

votes

1 answer

Scalding: How to change default tuple comparison function?

Doing Scalding MapReduce operations I need to compare tuples using my own comparison function on tuple fields. Questions: How to define my own tuple comparison function? What are the rules to extend Scalding with custome Scala code in general?…

scala scalding

asked Jul 09 '14 at 12:32

DarqMoth

603
1
13
31

0

votes

2 answers

Scalding Tutorial with HDFS: Data is missing from one or more paths in: List(tutorial/data/hello.txt)

After configuring ssh and rsync when I try to run Scalding tutorial (https://github.com/Cascading/scalding-tutorial/) with command: $ scripts/scald.rb --hdfs tutorial/Tutorial0.scala I get the following…

scala hadoop scalding

asked Jul 07 '14 at 15:46

DarqMoth

603
1
13
31

0

votes

1 answer

Scalding Tutorial: HDFS rsync errors

Please help to understand output of unsucessfull Scalding run on Hadoop. I got latest Scalding distribution from git: git clone https://github.com/twitter/scalding.git After sbt assembly from scalding directory I tried to run tutorial with…

scala hadoop hdfs scalding

asked Jul 03 '14 at 16:14

DarqMoth

603
1
13
31

Questions tagged [scalding]