Questions tagged [scalding]

Scalding is a scala DSL for Cascading, running on Hadoop.

Scalding is a scala DSL for Cascading, running on Hadoop.

See https://github.com/twitter/scalding

181 questions
1
vote
0 answers

Importing Scalding-produced CSV into MySQL

I produced a CSV file using Scalding's default Csv writer (specifying on the p parameter for the path to write to, and not any of the other parameters for how to write the CSV data) that I am looking to import into MySql. I am running into a problem…
Marc L'Heureux
  • 392
  • 3
  • 13
1
vote
1 answer

jobs run with no mappers or reducers

I have written a job using scalding that runs great in local mode. But when I try to execute it in hdfs mode (on the same file), it doesn't do anything. More precisely, the first step has no tasks (mappers nor reducers) and the steps afterwards…
IttayD
  • 28,271
  • 28
  • 124
  • 178
1
vote
1 answer

Gradle built jar does not find my main class

At work we use gradle on a Scalding project and I'm trying to come up with the simplest job to get the hand out of the stack. My class looks as : package org.playground import com.twitter.scalding._ class readCsv(args: Args) extends Job(args) { …
tutuca
  • 3,444
  • 6
  • 32
  • 54
1
vote
2 answers

Mock a TSV source with Scalding JobTest

I'm having a hard time making a unit test for my scalding Job. My Job expects a file with three fields: TextLine(args("input")) .map('url -> ('fetchedUrl,'date,'info)){ ... Naively I would've expected that the fields got mapped as a…
tutuca
  • 3,444
  • 6
  • 32
  • 54
1
vote
1 answer

Transforming matrix format, scalding

Ok, so, in scalding we can easily work with matrix, using matrix api, and it is ok - in a such way: val matrix = Tsv(path, ('row, 'col, 'val)) .read .toMatrix[Long,Long,Double]('row, 'col, 'val) But how can I transform matrix to that format…
DaunnC
  • 1,301
  • 15
  • 30
1
vote
1 answer

Decomposition of expressions (operations on Matrix), hadoop

I'm trying to make a sort of mini framework for big matrix calculations on hadoop; what i mean, smth like~ Prod(Sum(x, y), z) // (X + Y) * Z , where x, y, z - matrix or numbers, and calculate it, and write some output result into file. So I'm using…
DaunnC
  • 1,301
  • 15
  • 30
1
vote
1 answer

top 10 path reduction map reduce

I am working on a project which needs path navigational graph. Problem Description: To give the project context, the sample UI is expected to look similar to: http://bl.ocks.org/mbostock/4063570 .The difference is that it will be for site…
Kunal
  • 2,929
  • 6
  • 21
  • 23
1
vote
1 answer

Reading and writing to hadoop sequence file using scala

I just started using scalding and trying to find examples of reading a text file and writing to a hadoop sequence file. Any help is appreciated.
1
vote
0 answers

Building Inverted Index exceed the Java Heap Size

This might be a very special case but after pounding on my head for a while I thought to get help from Stackoverflow community. I am building an inverted index for large data set (One day worth of data from large system). The building of inverted…
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232
1
vote
2 answers

Multiple input files in Scalding

I want to process a large number of textfiles stored in s3. Unfortunately, I cannot simply use a list together with the MultipleTextLineFiles source because the method code becomes too large and a java.lang.RuntimeException is thrown. My last…
kyrre
  • 626
  • 2
  • 9
  • 24
1
vote
2 answers

How to average several columns at once in Scalding?

As the final step on some computations with Scalding I want to compute several averages of the columns in a pipe. But the following code doesn't work myPipe.groupAll { _average('col1,'col2, 'col3) } Is there any way to compute such functions sum,…
tonicebrian
  • 4,715
  • 5
  • 41
  • 65
0
votes
1 answer

Scalding Unit Test - How to Write A Local File?

I work at a place where scalding writes are augmented with a specific API to track dataset meta data. When converting from normal writes to these special writes, there are some intricacies with respect to Key/Value, TSV/CSV, Thrift ... datasets. I…
codeaperature
  • 1,089
  • 2
  • 10
  • 25
0
votes
1 answer

having trouble installing scalding

I'm installing Scalding and sbt on my system but running command sbt assembly gives the following error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See…
0
votes
1 answer

How to concatenate list of strings with map reduce in scala

I have a list of list of strings and I want to concatenate all the unique strings into a single (delimited space) string, something that flatMap allows one to do. However I am confused about the correct usage of the reduce function when…
0
votes
1 answer

Is there a way to specify the number of mappers in Scalding?

I am new to scalding world. My scalding job will have multiple stages, and I need to tune each stage individually. I have found that we might be able to change the number of reducers by using withReducers. Also, I am able to set the split size for…
WarfDog
  • 11
  • 4