2

As part of investigation I came to this link:

flatMap doesn't preserve order when creating lists from pyspark dataframe columns

which would suggest that it is not safe. However, these links state that flatMap() preserves order:

Does flatMap keep the order intact?

Does it mean that the function which does not preserve the order is collect() with respect to the df.select(column_name).rdd.flatMap(lambda x: x).collect()?

MichiganMagician
  • 273
  • 2
  • 15

0 Answers0