0

The pyspark documentation says that the pandas-on-spark is distributed. If I create a dataframe using pyspark.pandas.read_csv('file.csv'), how can I know the number of partitions of the pandas dataframe? Do we have an equivalent to df.rdd.getNumPartition() for pandas-on-spark dataframe?

Koedlt
  • 4,286
  • 8
  • 15
  • 33
shankar
  • 196
  • 14
  • pandas API in Pyspark is an expensive operation to the best of my knowledge. To read csv we can use spark.read.csv.Once you use the df=pyspark.pandas.read_csv('file.csv') .please let me know if I missed anything. – Avind Aug 19 '23 at 15:06

0 Answers0