Link crunch spark pipeline with spark application beginning with SparkSession instance

Question

Crunch pipeline can have Java spark context as parameter, but if the spark application starts with SparkSession instance(as the spark Java program includes Datasets and requires sparkSQL). How do i add another layer of abstraction(crunch pipeline) over spark application in such case?

score 0 · Answer 1 · edited May 15 '17 at 14:28

0

Probably you have a misunderstanding of concepts. The spark pipeline in crunch is essentially to make Crunch run your code in Spark Engine instead of MapReduce engine. The abstractions of Apache Crunch (PCollections) are a high level abstraction compared with MapReduce jobs and Spark pipelines.

edited May 15 '17 at 14:28

Eric Aya

69,473
35
181
253

answered May 15 '17 at 14:16

hlagos

7,690
3
23
41

Link crunch spark pipeline with spark application beginning with SparkSession instance

1 Answers1