Crunch pipeline can have Java spark context as parameter, but if the spark application starts with SparkSession instance(as the spark Java program includes Datasets and requires sparkSQL). How do i add another layer of abstraction(crunch pipeline) over spark application in such case?
Asked
Active
Viewed 89 times
1 Answers
0
Probably you have a misunderstanding of concepts. The spark pipeline in crunch is essentially to make Crunch run your code in Spark Engine instead of MapReduce engine. The abstractions of Apache Crunch (PCollections) are a high level abstraction compared with MapReduce jobs and Spark pipelines.