Highest Voted 'scalding' Questions

4

votes

0 answers

How do you deserialize Kryo into case classes using Scalding?

I know that Scalding's default serialization uses Kryo. So for this example, lets say I have a pipe of student objects. case class Student(name:String, id:String) val pipe: Pipe[Student] = //.... Then I write that pipe to a TextDelimited file…

asked Dec 23 '14 at 03:33

user3335040

649
1
7
17

4

votes

2 answers

Create Scalding Source like TextLine that combines multiple files into single mappers

We have many small files that need combining. In Scalding you can use TextLine to read files as text lines. The problem is we get 1 mapper per file, but we want to combine multiple files so that they are processed by 1 mapper. I understand we need…

scala hadoop cascading scalding

asked May 28 '14 at 16:45

samthebest

30,803
25
102
142

4

votes

1 answer

Programmatically determine Field names of Scalding/Cascading Pipe

I'm using Scalding to process records with many (> 22) fields. At the end of the process, I'd like to write out the final Pipe's field names to a file. I know this is possible as Mapper and Reducer logs show this information. I'd like to get this…

java scala hadoop cascading scalding

asked Mar 05 '14 at 01:11

Ben Sidhom

1,548
16
25

4

votes

1 answer

Cascading HBase Tap

I am trying to write Scalding jobs which have to connect to HBase, but I have trouble using the HBase tap. I have tried using the tap provided by Twitter Maple, following this example project, but it seems that there is some incompatibility between…

hadoop hbase cascading scalding

asked Mar 12 '13 at 14:08

Andrea

20,253
23
114
183

4

votes

3 answers

SQL Union equivalent in Twitter Scalding

I need to join 2 pipes with same set of fields, i.e ('id, 'groupName, 'name), same way as SQL UNION works. How it is possible to do it in Twitter Scalding?

scala mapreduce scalding

asked Oct 22 '12 at 12:23

victor.sarapin

41
2

3

votes

1 answer

Read multiple files using scalding and output a SINGLE file

I experience an issue these days, i am trying to read from multiple files using scalding and create an output with a single file. My code is this: def getFilesSource (paths: Seq[String]) = { new MultipleTextLineFiles(paths: _*) { override…

scala hadoop hdfs hadoop2 scalding

asked Dec 16 '16 at 11:07

George Lica

1,798
1
12
23

3

votes

2 answers

Scalding TypedPipe API External Operations pattern

I have a copy of Programming MapReduce with Scalding by Antonios Chalkiopoulos. In the book he discusses the External Operations design pattern for Scalding code. You can see an example on his website here. I have made a choice to use the Type…

scala hadoop design-patterns cascading scalding

asked Jan 08 '16 at 21:36

PhillipAMann

887
1
10
19

3

votes

0 answers

Scalding: Trouble reading avro file with nested structure

I need to read in an Avro file in Scalding but have no idea how to work with it. I have worked with straightforward avro files but this one is a little more complicated. The schema looks like this: {"type":"record", "name":"features", …

scala avro scalding

asked Dec 17 '14 at 21:49

J Calbreath

2,665
4
22
31

3

votes

1 answer

How to declare dependency on Scalding in sbt project?

I am trying to figure out how to create an build.sbt file for my own Scalding-based project. Scalding source structure has no build.sbt file. Instead it has project/Build.scala build definition. What would be the right way to integrate my own sbt…

eclipse scala sbt scalding

asked Jul 14 '14 at 12:47

DarqMoth

603
1
13
31

3

votes

1 answer

Scalding: retaining all fields after groupBy

I'm doing a groupBy for calculating a value, but it seems that when I group by, I lose all the fields that are not in the aggregation keys: filtered.filterNot('site) {s:String => ...} .filterNot('date) {s:String => ...} aggr =…

apache-pig cascading scalding

asked Apr 08 '14 at 15:19

Miguel Ping

18,082
23
88
136

3

votes

1 answer

Reading and Writing Case Classes in Scalding

Could someone point me to a link that explains how to read and write simple case classes in scalding? Is there some default serialization scheme? For example, I have jobs that create pipes of com.twitter.algebird.Moments. I wish to write the pipes…

serialization hadoop casting case-class scalding

asked Mar 17 '14 at 12:31

Mishael Rosenthal

619
6
12

3

votes

1 answer

Dependency issue with Scalding and Hadoop with sbt-assembly

I'm trying to build a far with sbt of a simple hadoop job I'm trying to run in an attempt to run it on Amazon EMR. However when I run sbt assembly I get the following error: [error] (*:assembly) deduplicate: different file contents found in the…

scala hadoop sbt sbt-assembly scalding

asked Sep 14 '13 at 21:59

tshauck

20,746
8
36
36

3

votes

5 answers

Alternatives to scalding for HBase access from Scala (or Java)

Could anybody please recommend good solution (framework) to access HBase on Hadoop cluster from Scala (or Java) application? By now I'm moving in scalding direction. Prototypes I obtained allowed me to combine scalding library with Maven and…

java scala hadoop hbase scalding

asked Apr 16 '13 at 19:32

Roman Nikitchenko

12,800
7
74
110

2

votes

0 answers

oozie wofklow intermittently fails on java action for scalding

I am using CDH (Cloudera Hadoop) version 5.12.0 (which uses: Hadoop 2.6.0 Oozie 4.1.0) and Scalding 2.11 I am using a shaded jar with my dependencies built in. I can run all my jobs properly without any error using a hadoop jar command as…

hadoop oozie scalding

asked Nov 06 '17 at 20:26

Murium

183
7

2

votes

1 answer

Store algebird Bloom Filter with Storehaus

I have a Spark job whose final output is an Algebird bloom filter, and I'd need to reuse this bloom filter in another Spark job. Is there a way to store this bloom filter in a kv store (eg: redis) using Twitter Storehaus and retrieve it in the…

scala apache-spark redis spark-streaming scalding

asked Jul 28 '16 at 14:52

arnaud briche

1,479
3
20
25

Questions tagged [scalding]