I know the question had been asked years ago, but I am still wondering the true purpose of using SparkSQL / HiveContext.
Spark approach gives a more generic distributed way that the built-in MapReduce.
I read a lot of articles claiming that MR way is already dead and Spark is the best (I understand that I can implement an MR approach through Spark).
When it is recommended to query data using HiveContext, I am a little bit confused.
Indeed, running a query from SparkSQL/HiveContext doesn't it imply running a MR job ? Isn't it to back to the main problematic ? TEZ isn't it enought if I don't need to encapsulate the query result in more complex code ?
Am I wrong (I am sure I am :-)) ?