In spark, Resilient Distributed Datasets (RDDs) are low-level API's and dataframes are a high-level API's so my question is when to use low-level API's?
Asked
Active
Viewed 1,154 times
1 Answers
1
Spark has two fundamental sets of APIs: the low-level “unstructured” APIs, and the higher-level structured APIs.
RDD can be process both structured as well as unstructured data where as a dataframe organizes the data into row column format therefore works on structured data. You can convert a dataframe to rdd if required.
In general people use dataframe and therefore high level api's as it gives more options. But this purely depends on your requirement.
I will suggest you to read either through books like 'Learning Spark' or 'Spark -The Defintive Guide', for more clarification.

swapnil shashank
- 877
- 8
- 11