I’m executing a scalding job in Hortonworks distribution (HDP 2.1) and it throws this following issue:
I tried to locate the cascading jar in Hortonworks but I couldn’t find it. What am I doing wrong here?
I have the following input tuple that I'd like to flatMap: (String, List[String])
E.G. Input:
("a", ["1", "2"])
("b", ["3", "4"])
Needed output:
("a", "1")
("a", "2")
("b", "3")
("b", "4")
Is there an elegant way to do this in Scalding/Scala?
Context: I'm reading in a file where multiple fields are a list of IDs. I need to convert these fields into a Pipe to join them with other Pipes.
What I have tried:
val otherPipe = pipe
.project('fieldIwant)
.map { p: Pipe =>…
I am trying to display some content on the console in a scalding script. When I run the same logic in the scalding shell I get the desired output and when I run the script I get an error:
scripttest.scala:4: error: value dump is not a member of…
I have almost finished my Scalding project which uses the Type Safe API instead of the Fields API. The last issue that remains for me in overall project set up is the integration tests of the entire Scalding job itself (I have finished unit tests…
I'm trying to write output from a scalding flow in json form, and reading it in Spark. This is working fine, except if the json contains strings with new lines. The output is one json object per line, and newlines in a value on the json is causing…
Just started to implement a simple scalding start-up program. Followed this documentation for references. In this first example it could not resolve write method as a syntax.
import com.twitter.scalding._
class WordCountJob(args: Args) extends…
I want to serialize a Scalding TypedPipe[MyClass] and desrialize it in Spark 1.5.1.
I am able to serialize/deserialize a "simple" case class containing only "primitives" such as Booleans and Maps, using kryo and Twitter's Chill for Scala:
//In…
In my Scalding hadoop job, I've got some grouping logic on a pipe, and then I need to process each group:
val georecs : TypedPipe[GeoRecord] = getRecords
georecs.map( r => (getRegion(r),r) )
.groupBy(_._1)
.mapValueStream( xs =>…
I am trying to sort the output of a groupBy statement using Scalding.
My dataset looks like this
Src Eqid Version Datetime Lat Lon Magnitude Depth NST Region
ci 15214001 0 …
I'm not able to run scalding test with JobTest class. Below is the command. How to send command of that?
Hadoop jar com.scala-0.0.1-SNAPSHOT.jar com.twitter.scalding.JobTest com.scala.etl --hdfs --input --output
facing below problem:
Exception in…
Basically I need to run a scalding job on EMR. The same job runs perfectly fine on local hadoop on my macbook, but fails on Hadoop on EMR.
I am trying hard to get help for this issue in the cascading-user and scala-user groups as well, and haven't…
I'm using CDH5.4. I'm running a hadoop job which from command line appears to be ok (when simply running with hadoop jar). However if I run it from yarn It finishes silently with a single mapper and no reducers. I really suspect both 'runs' were…
Recently we moved from using scalding to spark. I used eclipse and the scala IDE for eclipse to write code and tests. The tests ran fine with twitter's JobTest class. Any class using JobTest would be automatically available to run as a scala unit…
I'm using Scalding on Hadoop, I have a large dataset in the form of a TypedPipe I wish to output in chunks based on one of the data fields.
For example the data is , and I want the data for each category stored in a…