What does toPandas() actually do when using arrows optimization?
Is the resulting pandas dataframe safe for wide transformations (that requires data shuffling) on the pandas dataframe eg..merge
operations? what about group and aggregate? What kind of performance limitation should I expect?
I am trying to standardize to Pandas dataframe where possible, due to ease of unit testing and swapability with in-memory objects without starting the monstrous spark instance.