how to find getnumPartitions on pyspark.pandas dataframe

Asked Aug 19 '23 at 05:59

Active Aug 19 '23 at 13:10

Viewed 22 times

The pyspark documentation says that the pandas-on-spark is distributed. If I create a dataframe using pyspark.pandas.read_csv('file.csv'), how can I know the number of partitions of the pandas dataframe? Do we have an equivalent to df.rdd.getNumPartition() for pandas-on-spark dataframe?

edited Aug 19 '23 at 13:10

Koedlt

4,286
8
15
33

asked Aug 19 '23 at 05:59

shankar

pandas API in Pyspark is an expensive operation to the best of my knowledge. To read csv we can use spark.read.csv.Once you use the df=pyspark.pandas.read_csv('file.csv') .please let me know if I missed anything. – Avind Aug 19 '23 at 15:06

how to find getnumPartitions on pyspark.pandas dataframe

0 Answers0