Questions tagged [cascading]

Cascading is a Query API, Query Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.

Cascading is a Query API, Query Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.

Cascading is a thin Java library that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application. It is not a new text based query syntax (like Pig) or another complex system that must be installed on a cluster and maintained (like Hive). Though Cascading is both complimentary to and is a valid alternative to either application.

Cascading lets the developer quickly assemble complex distributed data-processing applications without having to "think" in MapReduce. And to efficiently schedule them based on their dependencies. Obviously simple data processing applications are supported as well, as complex applications tend to start simple.

Cascading is Open Source and dual licensed under the GPL and OEM/Commercial Licenses. OEM/Commercial Licenses and Developer Support can be obtained through Concurrent, Inc.

Cascading has a strong community of users and contributors, see our Cascading modules page for related projects and extensions.

Cascading, extensions, and related libraries are also hosted in the Conjars maven repository maintained by Concurrent, Inc. The repository is open to the public.

Cascading application-stack overview: enter image description here

Links:

364 questions
3
votes
1 answer

Hadoop: How to collect output of Reduce into a Java HashMap

I'm using Hadoop to compute co-occurrence similarity between words. I have a file that consists of co-occurring word pairs that looks like: a b a c b c b d I'm using a Graph based approach that treats words as nodes and co-occurring words have an…
codemaniac
  • 879
  • 1
  • 11
  • 31
3
votes
1 answer

Cascading Framework vs ETL tools like Talend

We have been using Cascading framework for creating ETL. Cascading gives. optimized joins Parallel running jobs Creating checkpoints Developers can work on their favorite language(java,ruby,scala,clojure) Unit Testing. Now we have two options…
3
votes
1 answer

cascading : how to define every map-reduce job in configuration?

My code is below. This is cascading code. And it has 8 jobs. I don't know how to configure every job. Code below configure 8 jobs togerher. But what i want to do is let last job one reduce. I want to ask how to recoginse this 8 jobs, and how to…
cdhit
  • 1,384
  • 1
  • 15
  • 38
3
votes
2 answers

Cascading S3 Sink Tap not being deleted with SinkMode.REPLACE

We are running Cascading with a Sink Tap being configured to store in Amazon S3 and were facing some FileAlreadyExistsException (see [1]). This was only from time to time (1 time on around 100) and was not reproducable. Digging into the Cascading…
3
votes
1 answer

JPA: How do I add new Items to a List with a OneToMany annotation

I have 2 tables. One is called Employee, and the other is called Phones, and an employee can have multiple Phones. Employee Class: @Entity @Table(name = "employee") public class Employee { @Id @Column(name = "id", unique = true, nullable =…
user64141
  • 5,141
  • 4
  • 37
  • 34
3
votes
2 answers

Partial aggregation vs Combiners which one faster?

There are notice about what how cascading/scalding optimized map-side evaluation They use so called Partial Aggregation. Is it actually better approach then Combiners? Are there any performance comparison on some common hadoop tasks(word count for…
yura
  • 14,489
  • 21
  • 77
  • 126
2
votes
1 answer

Understanding Relationship Cascading with Merge

Deep diving into Spring JPA and I have troubles to understand whats going on here. This is the code: Entity @Entity public class PersonP { @Id @GeneratedValue private int id; public String name; @ManyToMany public…
Remo
  • 1,112
  • 2
  • 12
  • 25
2
votes
2 answers

Is there a shorthand for inserting .NET Blazor components with multiple cascading values?

I am working on a .NET Blazor project and need to pass multiple cascading values to a generic subform. The following code, which applies multiple [CascadingValue] attributes, works fine for passing a few values but becomes a little cumbersome when…
geoCode
  • 89
  • 7
2
votes
2 answers

cascading dropdown in dynamic row angular?

I have cascading dropdown list which works alone, but when I generate a new row and change one of my dropdowns its affects my entire dropdown list. Here is my html code.