I have a dataframe called 'df1' which has X rows, suppose 1000. What I want to do is to get a concrete subsample of that dataframe and save as another. For example, I want to extract the rows 400 to 700 from 'df1' and save it as 'df2'.
I know that one possible way is getting the content of 'df1' as a vector with:
list = df1.collect()
subsample = list[400:700]
df2 = sc.createDataFrame(subsample, attributes)
But my question is: is there any other way of getting the same result not loading the data in a list? I ask this because when you have a huge dataset maybe it will not be efficient loading data with collect and generating another dataframe.
Thanks.