Questions tagged [cascading]

Cascading is a Query API, Query Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.

Cascading is a thin Java library that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application. It is not a new text based query syntax (like Pig) or another complex system that must be installed on a cluster and maintained (like Hive). Though Cascading is both complimentary to and is a valid alternative to either application.

Cascading lets the developer quickly assemble complex distributed data-processing applications without having to "think" in MapReduce. And to efficiently schedule them based on their dependencies. Obviously simple data processing applications are supported as well, as complex applications tend to start simple.

Cascading is Open Source and dual licensed under the GPL and OEM/Commercial Licenses. OEM/Commercial Licenses and Developer Support can be obtained through Concurrent, Inc.

Cascading has a strong community of users and contributors, see our Cascading modules page for related projects and extensions.

Cascading, extensions, and related libraries are also hosted in the Conjars maven repository maintained by Concurrent, Inc. The repository is open to the public.

Cascading application-stack overview: enter image description here

Links:

Cascading Homepage

364 questions

votes

1 answer

Loading data from Hadoop Cascading Source into MySQL Sink

I'm trying to integrate writing data in from a Cascading source into MySQL so I wonder if there's an easy sink available to take the tab delimited data that's coming from the source and just doing a couple SQL statements to update a table. I'm new…

mysql hadoop cascading

asked Aug 01 '13 at 20:25

Hello Operator

votes

2 answers

Cascading + libjars = ClassNotFoundException. Sometimes

I am running Cascading (actually Scalding) hadoop job that uses DistributedCache for dependent jars. Fist time it works fine (meaning that the classpath is set up correctly) but then it starts failing with…

hadoop cascading scalding

asked Jul 25 '13 at 14:58

Sasha O

3,710
2
35
45

votes

1 answer

How can I read and write binary files in Cascading?

I want to load some files in binary format (for example jpegs, but could be any binary format), manipulate it somehow and write it back. I want to do that on hadoop, and I would like to write it over Cascading framework. Are there binary sinks /…

hadoop elastic-map-reduce emr cascading

asked Jul 17 '13 at 12:52

polo

1,352
2
16
35

votes

1 answer

How can I pass cascading parameters from ASP.NET to SSRS

I am trying to build web application (ASP.NET) that will be used to display an SSRS report. My report has 4 cascading parameters - A,B,C and D. C and D "depend" logically on the value of A (this means that the DataSets of C and D are filtered based…

asp.net reporting-services parameters cascading

asked Jul 16 '13 at 13:52

Valentin Mladenov

votes

1 answer

Hadoop Cascading framework to Update specific column data

I have a mongodb collection which looks like this Id Name createTime updateTime Age Country verificationStatus Id1 Abc 10-7-2013 10-7-2013 21 Xxxx INITIAL_MAIL Id2 Efg 9-7-2013 10-7-2013 22 Xxxx FIRST_REMINDER Id3 Hij…

hadoop cascading

asked Jul 11 '13 at 09:32

vinoth

votes

1 answer

is Cascading function executed in single thread as a hadoop mapper function?

I'm reading cascading documentation chapter 5.2 Functions and I wonder what will happen with the following code. Should it work OK in multithreaded environment? The more general question is is the Function could be multithreaded? as I know the…

hadoop mapreduce cascading

asked Jun 10 '13 at 07:44

Julias

5,752
17
59
84

votes

2 answers

Combining outputs in Cascading

I am analyzing log files with various domain names using Cascading. Here is an example of the output report after it has been filtered: www.google.nl 3 www.google.it 3 www.google.com.co 3 www.google.com.hk 3 www.google.co.jp 3 I would like to group…

filter cascading

asked Jun 03 '13 at 16:13

cevallos.valtira

votes

1 answer

What tools exist for benchmarking Cascading for Hadoop routines?

I have been given a multi-step Cascading program that runs in about ten times the amount of time that an equivalent M/R job runs. How do I go about figuring out which of the steps is running the slowest so I can target it for optimization?

optimization hadoop benchmarking cascading

asked Jun 03 '13 at 15:50

Robert Rapplean

votes

1 answer

Ignoring outputs in Cascading

I am analyzing log files with various domain names. I want to exclude/ignore from the output report any domain that has the word "macys". Here is an example output: l.macys.com 87516 www.google.com 3016 search.yahoo.com 584 www.bing.com…

filtering logfile cascading-deletes cascading

asked Jun 03 '13 at 15:39

cevallos.valtira

votes

1 answer

Cascading - regex parser - wrong number of fields

Starting to play with Cascading on Amazon EMR, have managed to get it running BUT falling at a fairly simple hurdle and I was hoping someone could shed some light on it. My code: import java.util.Properties; import cascading.flow.Flow; import…

regex hadoop cascading

asked May 24 '13 at 15:41

Duncan

10,218
14
64
96

votes

2 answers

hadoop cascading how to get top N tuples

New to cascading, trying to find out a way to get top N tuples based on a sort/order. for example, I'd like to know the top 100 first names people are using. here's what I can do similar in teradata sql: select top 100 first_name, num_records …

hadoop mapreduce sql-order-by cascading

asked Apr 30 '13 at 02:28

Kartrace

votes

2 answers

Getting cascading.tap.hadoop.io.MultiInputSplit class not found exception while running hadoop program using cascading framework

Here is my code that connects to hadoop machine and perform set of validation and write on another directory. public class Main{ public static void main(String...strings){ System.setProperty("HADOOP_USER_NAME", "root"); …

java hadoop cascading

asked Apr 13 '13 at 13:04

Mohammad Adnan

6,527
6
29
47

votes

1 answer

How to rename Pipe fields in cascading?

In two separate occasions, I've had to rename all the fields in a Pipe to join (using Merge or CoGroup). What I have done recently is: //These two pipes contain similar values but different Field Names Pipe papa = new Retain(papa, fieldsFrom); Pipe…

hadoop mapreduce cascading

asked Apr 11 '13 at 15:36

Engineiro

1,146
7
10

votes

2 answers

Cascading(buffer) implementation

I need to create a buffer in cascading hadoop. Suppose i have fields : member_id,amountpaid,diadnosis_id,diagnosis_description,superGrouper_id,superGrouper_descriptiion,grouperId,grouperDescription I need to group the fields from member_id and…

cascading

asked Mar 22 '13 at 07:55

Rach

votes

0 answers

Prevent cascading refreshes

I have a header.js that includes in its ready section the following: var auto_refresh = setInterval(function () { var theToken = $('#token').text(); $('#error-div').text(''); …

jquery triggers setinterval cascading

asked Mar 05 '13 at 16:35

John Wooten

Prev 1 2 3

…

24 25 Next