-3

In spark, Resilient Distributed Datasets (RDDs) are low-level API's and dataframes are a high-level API's so my question is when to use low-level API's?

1 Answers1

1

Spark has two fundamental sets of APIs: the low-level “unstructured” APIs, and the higher-level structured APIs.

RDD can be process both structured as well as unstructured data where as a dataframe organizes the data into row column format therefore works on structured data. You can convert a dataframe to rdd if required.

In general people use dataframe and therefore high level api's as it gives more options. But this purely depends on your requirement.

I will suggest you to read either through books like 'Learning Spark' or 'Spark -The Defintive Guide', for more clarification.

swapnil shashank
  • 877
  • 8
  • 11