I produced a CSV file using Scalding's default Csv writer (specifying on the p parameter for the path to write to, and not any of the other parameters for how to write the CSV data) that I am looking to import into MySql. I am running into a problem…
I have written a job using scalding that runs great in local mode. But when I try to execute it in hdfs mode (on the same file), it doesn't do anything. More precisely, the first step has no tasks (mappers nor reducers) and the steps afterwards…
At work we use gradle on a Scalding project and I'm trying to come up with the simplest job to get the hand out of the stack.
My class looks as :
package org.playground
import com.twitter.scalding._
class readCsv(args: Args) extends Job(args) {
…
I'm having a hard time making a unit test for my scalding Job.
My Job expects a file with three fields:
TextLine(args("input"))
.map('url -> ('fetchedUrl,'date,'info)){
...
Naively I would've expected that the fields got mapped as a…
Ok, so, in scalding we can easily work with matrix, using matrix api, and it is ok - in a such way:
val matrix = Tsv(path, ('row, 'col, 'val))
.read
.toMatrix[Long,Long,Double]('row, 'col, 'val)
But how can I transform matrix to that format…
I'm trying to make a sort of mini framework for big matrix calculations on hadoop; what i mean, smth like~ Prod(Sum(x, y), z) // (X + Y) * Z , where x, y, z - matrix or numbers, and calculate it, and write some output result into file.
So I'm using…
I am working on a project which needs path navigational graph.
Problem Description:
To give the project context, the sample UI is expected to look similar to: http://bl.ocks.org/mbostock/4063570
.The difference is that it will be for site…
This might be a very special case but after pounding on my head for a while I thought to get help from Stackoverflow community.
I am building an inverted index for large data set (One day worth of data from large system). The building of inverted…
I want to process a large number of textfiles stored in s3. Unfortunately, I cannot simply use a list together with the MultipleTextLineFiles source because the method code becomes too large and a java.lang.RuntimeException is thrown.
My last…
As the final step on some computations with Scalding I want to compute several averages of the columns in a pipe. But the following code doesn't work
myPipe.groupAll { _average('col1,'col2, 'col3) }
Is there any way to compute such functions sum,…
I work at a place where scalding writes are augmented with a specific API to track dataset meta data. When converting from normal writes to these special writes, there are some intricacies with respect to Key/Value, TSV/CSV, Thrift ... datasets. I…
I'm installing Scalding and sbt on my system but running command sbt assembly gives the following error:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See…
I have a list of list of strings and I want to concatenate all the unique strings into a single (delimited space) string, something that flatMap allows one to do. However I am confused about the correct usage of the reduce function when…
I am new to scalding world. My scalding job will have multiple stages, and I need to tune each stage individually.
I have found that we might be able to change the number of reducers by using withReducers. Also, I am able to set the split size for…