Questions tagged [cascading]

Cascading is a Query API, Query Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.

Cascading is a Query API, Query Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.

Cascading is a thin Java library that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application. It is not a new text based query syntax (like Pig) or another complex system that must be installed on a cluster and maintained (like Hive). Though Cascading is both complimentary to and is a valid alternative to either application.

Cascading lets the developer quickly assemble complex distributed data-processing applications without having to "think" in MapReduce. And to efficiently schedule them based on their dependencies. Obviously simple data processing applications are supported as well, as complex applications tend to start simple.

Cascading is Open Source and dual licensed under the GPL and OEM/Commercial Licenses. OEM/Commercial Licenses and Developer Support can be obtained through Concurrent, Inc.

Cascading has a strong community of users and contributors, see our Cascading modules page for related projects and extensions.

Cascading, extensions, and related libraries are also hosted in the Conjars maven repository maintained by Concurrent, Inc. The repository is open to the public.

Cascading application-stack overview: enter image description here

Links:

364 questions
0
votes
1 answer

mvc 4 missing anything on these cascading dropdowns

hello good evening everybody I am trying to get data when one of my dropdown selected index change here are my codes I couldn't get where am ı missing or doing something wrong I am very glad if you help me thank you . this is my script…
0
votes
1 answer

Cascading filter out bad records in a file

I am using Custom Functions for DQ checks in Cascading where I am setting an indicator based on which I will filter out the records at last into required pipes I have written two functions for it. In the below code, Field 'A' is a String for which…
user2732748
  • 97
  • 4
  • 12
0
votes
0 answers

Crystal Reports Cascading Static Parameters

I have a report with 3 parameters (ReportType, DateRange, OrderNum) and I'd like to cascade them if possible. My record filter is working properly. ReportType = 1 = All, 2 = By Date Range, 3 = By Order# If Report Type = 1, then don't prompt for…
Bill
  • 1
0
votes
1 answer

Cascading Driven Self-Hosted Version Server Error

I am using a Driven Self-Hosted version in a Cloudera-5 (CDH-5) VM. I was able to install the Driven server successfully and able to open the server in the URL localhost.localdomain:8080. I have provided the below values in the file…
user2732748
  • 97
  • 4
  • 12
0
votes
1 answer

how to avoid filling up hadoop logs on nodes?

When our Cascading jobs encounter an error in data, they throw various exceptions… These end up in the logs, and if the logs fill up, the cluster stops working. do we have any config file to be edited/configured to avoid such scenarios? we are using…
0
votes
1 answer

Cascading for Impatient TFIDF example freezing

I'm trying to work with Cascading to create and execute complex data processing workflows on a local Hadoop cluster. I wish to create a TFIDF vector so I can apply Machine Learning algorithms such as NaiveBayes on it using the Apache Spark…
eliasah
  • 39,588
  • 11
  • 124
  • 154
0
votes
1 answer

Automatic Hive or Cascading for ETL in AWS-EMR

I have a large dataset residing in AWS S3. This data is typically a transactional data (like calling records). I run a sequence of Hive queries to continuously run aggregate and filtering condtions to produce a couple of final compact files (csvs…
prog_guy
  • 796
  • 3
  • 7
  • 24
0
votes
1 answer

Cascading Parameter SSRS Report hangs on Production Server

I have a report in which i used two multivalue parameters Affiliate, TFN. Both fetch their available values using query. Affiliate is independent while TFN list is populated once Affiliate is selected. The report works fine when i run in development…
aadi
  • 86
  • 6
0
votes
0 answers

assertEquals not comparing HashSet java

I am writing Unit test using Junit 4 to compare two List I have two lists of tuples of type java.util.List The tuple order can be different inside the List. I would like to compare these two Lists. What i have done assertEquals(new HashSet(List1),…
nothing_authentic
  • 2,927
  • 3
  • 17
  • 22
0
votes
1 answer

Code first, Data is not inserting into database using cascading dropdown list

in my create [ Http post ] method, all data is inserting except the cascading drop down items. I have Department, Subject and section model. One department can have many subjects, one subject can have many sections. after adding jquery submit…
InsParbo
  • 390
  • 2
  • 13
0
votes
1 answer

Custom scalding tap (or Spark equivalent)

I am trying to dump some data that I have on a Hadoop cluster, usually in HBase, with a custom file format. What I would like to do is more or less the following: start from a distributed list of records, such as a Scalding pipe or similar group…
Andrea
  • 20,253
  • 23
  • 114
  • 183
0
votes
2 answers

OpenCV haartraining: Mergevec error: Input file does not exist or not readable

Following this tutorial, I've created my positive samples but need to merge them now, using mergevec. I downloaded the mergevec.exe binary file provided and got the two required dlls cxcore100.dll and highgui100.dll. However, when I run it like…
user961627
  • 12,379
  • 42
  • 136
  • 210
0
votes
1 answer

Class Tap requires Type Parameters

Scala noob here. I'm integrating a webcrawler that uses cascading internally (bixo). So i've been investing some time in porting an example they provide (see here) line by line. So far I'm making little progress, and one thing I'm stuck with is at…
tutuca
  • 3,444
  • 6
  • 32
  • 54
0
votes
1 answer

Use fields in one tuplestream as part of regex in RegexParser on second tuplestream

I'm trying to read in a csv in the hdfs, parse it with cascading, and then use the resulting tuple stream to form the basis of regex expressions in another tuple stream using RegexParser. As far as I can tell, the only way to do this would be to…
CalebJ
  • 159
  • 1
  • 10
0
votes
1 answer

Exception in thread "main" java.lang.NullPointerException while using org.apache.hadoop DistributedFileSystem

String inputPath = args[0]; FileSystem dfs = new DistributedFileSystem(); FileStatus[] files= null; try{ files = dfs.listStatus(new path(inputPath)); } catch(IOExcpeption err){ //Do stuff } The code build fine with maven. However, when I…
CalebJ
  • 159
  • 1
  • 10