1

Crunch pipeline can have Java spark context as parameter, but if the spark application starts with SparkSession instance(as the spark Java program includes Datasets and requires sparkSQL). How do i add another layer of abstraction(crunch pipeline) over spark application in such case?

devastrix
  • 91
  • 9

1 Answers1

0

Probably you have a misunderstanding of concepts. The spark pipeline in crunch is essentially to make Crunch run your code in Spark Engine instead of MapReduce engine. The abstractions of Apache Crunch (PCollections) are a high level abstraction compared with MapReduce jobs and Spark pipelines.

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
hlagos
  • 7,690
  • 3
  • 23
  • 41