Highest Voted 'scalding' Questions

2

votes

0 answers

Is there a class to signify "grouped by and reduced"?

consider the following code in Scalding: Let's say I have the following tuples in a scalding TypedPipe[(Int, Int)]: (1, 2) (1, 3) (2, 1) (2, 2) On this pipe I can call groupBy(t => t._1) to generate a Grouped[Int, (Int, Int)] , which will still…

scalding

asked Apr 07 '16 at 12:46

lezebulon

7,607
11
42
73

2

votes

0 answers

What is causing "org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: null"?

I have an Elastic MapReduce job which uses elasticsearch-hadoop via scalding-taps to transfer data from Amazon S3 to Amazon Elasticsearch Service. For a long time this job ran successfully. However, it has recently started failing with the following…

hadoop elasticsearch elastic-map-reduce cascading scalding

asked Mar 02 '16 at 10:29

fblundun

987
7
19

2

votes

1 answer

Scalding job failing with VerifyError on EMR version 4.2.0

We have a Scalding job which I want to run on the AWS Elastic MapReduce using release label 4.2.0. This job ran successfully on AMI 2.4.2. When we upgraded it to AMI 3.7.0, we ran into a java.lang.VerifyError caused by incompatible jars. Our project…

java scala hadoop emr scalding

asked Nov 23 '15 at 11:35

fblundun

987
7
19

2

votes

0 answers

Scalding write to JDBCSource having greater than 22 columns

Is there a way in scalding to write to a SQL table that has greater than 22 columns? The problem I am facing is as follows. I have a table which has 28 columns, each row of which I am representing using a case class. Something like case class…

scalding

asked Nov 09 '15 at 22:20

rmathews7

175
8

2

votes

2 answers

Why Scala cannot infer type argument when it's obvious?

In the following example, I was trying to create an implicit conversion between MySource and TypedPipe[T]. I own MySource, in fact I have a lot of such sources, so I wanted to use a Porable[T] trait to mark what type argument T I want for the output…

scala scalding

asked Jun 09 '15 at 07:47

Roy

880
1
12
27

2

votes

1 answer

how to run scalding test in local mode with local input file

Scalding has a great utility to run an integration test for the job flow. In this way the inputs and outputs are the in-memory buffer val input = List("0" -> "This a a day") val expectedOutput = List(("This", 1),("a", 2),("day", 1)) …

scala tdd scalding

asked Apr 21 '15 at 15:23

Julias

5,752
17
59
84

2

votes

1 answer

scalding testing job with JobTest and Csv(skipHeader = true) input

I have this job: import com.twitter.scalding.{Args, Csv, Job} class ManagersAndTeams(args: Args) extends Job(args) { val managersPipe = Csv(args("managers"), skipHeader = true) .project('managerID, 'teamID) val teamsPipe =…

scalding

asked Mar 22 '15 at 17:41

kostas.kougios

945
10
21

2

votes

1 answer

How to output data with Hive-style directory structure in Scalding?

We are using Scalding to do ETL and generate the output as a Hive table with partitions. Consequently, we want the directory names for partitions to be something like "state=CA" for example. We are using TemplatedTsv as follows: pipe // some…

scalding

asked Feb 24 '15 at 03:08

Chung

21
2

2

votes

2 answers

How to measure the running time of a scala scalding program?

I have a simple scalding program to transform some data which I execute using com.twitter.scalding.Tool in local mode. val start = System.nanoTime val inputPaths = args("input").split(",").toList val pipe = Tsv(inputPaths(0)) // standard pipe…

scala hadoop scalding

asked Dec 02 '14 at 17:02

Yuri Brovman

1,093
2
12
17

2

votes

1 answer

Scalding, flatten fields after groupBy

I see this: Scalding: How to retain the other field, after a groupBy('field){.size}? it's a real pain and a mess comparing to Apache Pig... What do I do wrong? Can I do the same like GENERATE(FLATTEN()) pig? I'm confused. Here is my scalding code: …

scala tuples scalding

asked Sep 23 '14 at 12:14

Capacytron

3,425
6
47
80

2

votes

1 answer

Adding parquet-avro support to scalding

How can I create a Scalding Source that will handle conversions between avro and parquet. The solution should: 1. Read from parquet format and convert to avro memory representation 2. Write avro objects into a parquet file Note: I noticed…

hadoop avro cascading scalding parquet

asked Sep 12 '14 at 13:28

beefyhalo

1,691
2
21
33

2

votes

2 answers

Does Scalding support record filtering via predicate pushdown w/Parquet?

There are obvious speed benefits from not having to read records that would fail a filter. I see Spark support for it, but I haven't found any documentation on how to do it w/Scalding.

scalding parquet

asked Aug 03 '14 at 14:10

Nick

1,012
2
13
29

2

votes

2 answers

Scalding: parsing comma-separated data with header

I have data in format: "header1","header2","header3",... "value11","value12","value13",... "value21","value22","value23",... .... What is the best way to parse it in Scalding? I have over 50 columns altogether, but I am only interested in some of…

scala parsing hadoop mapreduce scalding

asked Jul 28 '14 at 16:47

Savage Reader

387
1
4
16

2

votes

1 answer

How to run slim jar in scalding / hadoop job without writing the full classpath in libjars

Is there a way to run a scalding job that needs class-path without using libjars and writing each jar explicitly comma separated. I would like to put all my jars in a lib and than just write -libjars=./lib/* and not all the jars. Is there a classic…

java hadoop jar maven-3 scalding

asked Jul 22 '14 at 14:27

Ehud Lev

2,461
26
38

2

votes

0 answers

Reading SequenceFile written by Spark

I have bunch of sequence files that I want to read using Scalding and I am having some troubles. This is my code: class ReadSequenceFileApp(args:Args) extends ConfiguredJob(args) { SequenceFile(args("in"), ('_, 'wbytes)) .read …

scala hadoop cascading sequencefile scalding

asked Jul 02 '14 at 12:00

Rob Schneider

679
4
13
27

Questions tagged [scalding]