Questions tagged [cascading]

Cascading is a Query API, Query Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.

Cascading is a Query API, Query Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.

Cascading is a thin Java library that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application. It is not a new text based query syntax (like Pig) or another complex system that must be installed on a cluster and maintained (like Hive). Though Cascading is both complimentary to and is a valid alternative to either application.

Cascading lets the developer quickly assemble complex distributed data-processing applications without having to "think" in MapReduce. And to efficiently schedule them based on their dependencies. Obviously simple data processing applications are supported as well, as complex applications tend to start simple.

Cascading is Open Source and dual licensed under the GPL and OEM/Commercial Licenses. OEM/Commercial Licenses and Developer Support can be obtained through Concurrent, Inc.

Cascading has a strong community of users and contributors, see our Cascading modules page for related projects and extensions.

Cascading, extensions, and related libraries are also hosted in the Conjars maven repository maintained by Concurrent, Inc. The repository is open to the public.

Cascading application-stack overview: enter image description here

Links:

364 questions
4
votes
1 answer

Cascading HBase Tap

I am trying to write Scalding jobs which have to connect to HBase, but I have trouble using the HBase tap. I have tried using the tap provided by Twitter Maple, following this example project, but it seems that there is some incompatibility between…
Andrea
  • 20,253
  • 23
  • 114
  • 183
4
votes
3 answers

Run a simple Cascading application in local mode

I'm new to Cascading/Hadoop and am trying to run a simple example in local mode (i.e. in memory). The example just copies a file: import java.util.Properties; import cascading.flow.Flow; import cascading.flow.FlowConnector; import…
Clayton
  • 6,089
  • 10
  • 44
  • 47
4
votes
2 answers

Hadoop Cascading - create flow with one source, two sinks

I am using Cascading 2 to create Hadoop jobs and am trying to create a flow that starts with a single source. After a couple of functions are applied to the data I need to split the flow so that this data is used to create two separate reports (in…
hello-klol
  • 735
  • 10
  • 20
3
votes
1 answer

JPA: Cascading OneToOne : which side should be the cascading attribute?

So far my understand was that cascading only makes sense from the parent to the child. Now I'm wondering: does this also applies to OneToOne relationships? I'm asking because I found in our code many (unidirectional) OneToOne relationships with…
Julien Berthoud
  • 721
  • 8
  • 24
3
votes
1 answer

Cascading - merge 2 aggregations

I have the following problem whicj I am trying to solve with cascading: I have csv file of records with the structure: o,a,f,i,c I need to to aggregate the records by o,a,f and to sum the i's and c's per group. For…
yosi
  • 639
  • 1
  • 12
  • 21
3
votes
1 answer

Is Hive QL have same expressive power as writing your own MapReduce Jobs directly on Hadoop?

To put in other words, Is there a problem that can be solved by directly defining your map reduce jobs, but for which you cannot form a Hive QL query? If yes, then it means that Hive QL is limited in it's expressive power and cannot express all…
user855
  • 19,048
  • 38
  • 98
  • 162
3
votes
0 answers

Modal scroll not working properly only on safari browser

I have used modal in my react.js website. But it's scroll event not working properly in safari browser. I have checked it all other browsers, It's working properly. In Safari when we scroll down it's merged all text fields. I have given my modal…
sameer
  • 31
  • 1
3
votes
2 answers

Where can I find a HBase cascading module for hbase-0.89.20100924+28?

I am working on a project using map reduce and HBase. We are using Cloudera’s CDH3 distribution which has hbase-0.89.20100924+28 bundled into it. I would like to use cascading as we have some processing that requires multiple map reduce jobs, but I…
Rob
  • 245
  • 1
  • 5
  • 14
3
votes
1 answer

Foreign Key constraint one to many tables while on delete ,on update cascade rule

I'm working on a web application where i have database problem. I have three tables : are as follows Table 1: CREATE TABLE mydb.emp( eID INT NOT NULL, eName VARCHAR(45) NULL, PRIMARY KEY(eID) ); Table 2: CREATE TABLE…
Saikrishna
  • 87
  • 2
  • 11
3
votes
1 answer

What is the equivalent of SQL NOT IN in Cascading Pipes?

I have two files with one common field, based on that field value i need to get the second file values. How do i add the where Condition here? Is there any other PIPE available for NOT IN…
Shankar
  • 8,529
  • 26
  • 90
  • 159
3
votes
2 answers

Scalding TypedPipe API External Operations pattern

I have a copy of Programming MapReduce with Scalding by Antonios Chalkiopoulos. In the book he discusses the External Operations design pattern for Scalding code. You can see an example on his website here. I have made a choice to use the Type…
PhillipAMann
  • 887
  • 1
  • 10
  • 19
3
votes
4 answers

CascadeType vs FetchType

I would like to now what is the difference between CascadeType and FetchType in Hibernate? They seem very similar but I guess they are not interchangeable, right? When to use them? Can they be used both at the same time?
jarosik
  • 4,136
  • 10
  • 36
  • 53
3
votes
2 answers

Cascading Text file to Parquet

I am trying to convert a file into Parquet using Cascading. But I am getting the below error. Error Exception in thread "main" cascading.flow.planner.PlannerException: tap named: 'Copy', cannot be used as a sink: Hfs["ParquetTupleScheme[['A',…
user2732748
  • 97
  • 4
  • 12
3
votes
1 answer

Scalding: retaining all fields after groupBy

I'm doing a groupBy for calculating a value, but it seems that when I group by, I lose all the fields that are not in the aggregation keys: filtered.filterNot('site) {s:String => ...} .filterNot('date) {s:String => ...} aggr =…
Miguel Ping
  • 18,082
  • 23
  • 88
  • 136
3
votes
1 answer

LightSwitch dynamic cascading dropdown lists

In lightswitch, I need to make dynamic cascading dropdown lists based on a recursive relationship: Table "Categories" includes: Id Name ParentId here is the desired scenario: a screen showing a drop down list for the categories with no…
1 2
3
24 25