Questions tagged [scalding]

Scalding is a scala DSL for Cascading, running on Hadoop.

Scalding is a scala DSL for Cascading, running on Hadoop.

See https://github.com/twitter/scalding

181 questions
0
votes
1 answer

Scalding tutorial: com.twitter.scalding.InvalidSourceException: Data is missing from one or more paths

With Hadoop 2.2 installed on single node I try to run Scalding tutorial, part 1, with command: $ yarn jar target/scalding-tutorial-0.8.11.jar Tutorial0 --hdfs https://github.com/Cascading/scalding-tutorial/ Before running tutorial I Have copied…
DarqMoth
  • 603
  • 1
  • 13
  • 31
0
votes
1 answer

Scalding tutorial: java.lang.ClassNotFoundException

Please help to run Scalding tutorial. I have Hadoop 2.2 running on a single node and trying to run Scalding tutorial: https://github.com/Cascading/scalding-tutorial/ After successfuly buiding 'fat jar' with these commands: $ git clone…
DarqMoth
  • 603
  • 1
  • 13
  • 31
0
votes
3 answers

convert list of elements to tuple5, prevent index out of bounds

I'm trying to create a tuple from a scala list: .map('path -> ('uri1, 'uri2, 'uri3, 'uri4, 'uri5)) {elems:List[String] => (elems(0), elems(1), elems(2), elems(3), elems(4)) //ouf of bounds! } But the elems may have between 1 and 5 elements, so…
Miguel Ping
  • 18,082
  • 23
  • 88
  • 136
0
votes
2 answers

IntelliJ 13 with SBT plugin does not recognize Scalding dependency

I am trying to add Scalding 2.10 as a managed dependency via build.sbt like so: name := "ss" version := "1.0" libraryDependencies += "com.twitter" % "scalding_2.10" % "0.10.0" IntelliJ downloads the jar and adds it as an external library (see…
cdlm
  • 565
  • 9
  • 20
0
votes
1 answer

Mapping Scala class to Scalding or MongoDB

I am new to both Scala and NoSQL databases. I would like to know is if there exist ORM tools that will map my Scala objects to a NoSQL database, as with RDBMS solutions?
Omid
  • 1,959
  • 25
  • 42
0
votes
1 answer

Custom scalding tap (or Spark equivalent)

I am trying to dump some data that I have on a Hadoop cluster, usually in HBase, with a custom file format. What I would like to do is more or less the following: start from a distributed list of records, such as a Scalding pipe or similar group…
Andrea
  • 20,253
  • 23
  • 114
  • 183
0
votes
1 answer

[Scala/Scalding]: map ID to name

I am fairly new to Scalding and I am trying to write a scalding program that takes as input 2 datasets: 1) book_id_title: ('id,'title): contains the mapping between book ID and book title, Both are strings. 2) book_sim: ('id1, 'id2, 'sim): contains…
user2327621
  • 957
  • 3
  • 11
  • 15
0
votes
1 answer

Using an HTTP Request as Pipe

I'm getting the hang out of Scalding I require to fetch a number of URLs from the internet. As it seems, Scala doesn't provide a single class to do the HTTP request in its standard library. As many of the bare java solutions I've seen seem too…
tutuca
  • 3,444
  • 6
  • 32
  • 54
0
votes
1 answer

Scalding + LZO +Protobuf

Are there any pointers to get Scalding to work with LZO Protobuf data on HDFS? I am trying to read files that are stored in binary Protobuf and compressed in LZO using Scalding. Can we use Elephantbird to read those files? Any pointers will be…
thinker25
  • 1
  • 2
0
votes
1 answer

transforming from native matrix format, scalding

So this question is related to question Transforming matrix format, scalding But now, I want to make the back operation. So i can make it in a such way: Tsv(in, ('row, 'col, 'v)) .read .groupBy('row) { _.sortBy('col).mkString('v, "\t") } …
DaunnC
  • 1,301
  • 15
  • 30
0
votes
2 answers

What is the replacement for summing list in Scala-Scalding

I have following code where I maintain a large List: What I do here is go over the data stream and create an inverted index. I use twitter scalding api and dataTypePipe is type of TypedPipe lazy val cats = dataTypePipe.cross(cmsCats) .map(vf =>…
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232
0
votes
2 answers

Cascading + libjars = ClassNotFoundException. Sometimes

I am running Cascading (actually Scalding) hadoop job that uses DistributedCache for dependent jars. Fist time it works fine (meaning that the classpath is set up correctly) but then it starts failing with…
Sasha O
  • 3,710
  • 2
  • 35
  • 45
0
votes
1 answer

Why mapped pairs get obliterated?

I'm trying to understand the example here which computes Jaccard similarity between pairs of vectors in a matrix. val aBinary = adjacencyMatrix.binarizeAs[Double] // intersectMat holds the size of the intersection of row(a)_i n row (b)_j val…
Peteris
  • 3,548
  • 4
  • 28
  • 44
0
votes
1 answer

wordcount.scala error

I installed scalding on my mac with OSX Lion. When I run the word count.scala program to test the installation, I get the following error message: scalac -classpath…
patzoul
  • 49
  • 2
  • 6
-1
votes
2 answers

Schema validation

H, I am looking for any example for schema validation for data. Is it possible to do using cascading or scalding. For example Name:String , Age:Int We say our data should confirm to above schema then we can validate if data really is of that…
user2230605
  • 2,390
  • 6
  • 27
  • 45
1 2 3
12
13