I select all from a table and create a dataframe (df) out of it using Pyspark. Which is partitioned as:
partitionBy('date', 't', 's', 'p')
now I want to get number of partitions through using
df.rdd.getNumPartitions()
but it returns a much larger number (15642 partitions) that expected (18 partitions):
show partitions command in hive:
date=2019-10-02/t=u/s=u/p=s
date=2019-10-03/t=u/s=u/p=s
date=2019-10-04/t=u/s=u/p=s
date=2019-10-05/t=u/s=u/p=s
date=2019-10-06/t=u/s=u/p=s
date=2019-10-07/t=u/s=u/p=s
date=2019-10-08/t=u/s=u/p=s
date=2019-10-09/t=u/s=u/p=s
date=2019-10-10/t=u/s=u/p=s
date=2019-10-11/t=u/s=u/p=s
date=2019-10-12/t=u/s=u/p=s
date=2019-10-13/t=u/s=u/p=s
date=2019-10-14/t=u/s=u/p=s
date=2019-10-15/t=u/s=u/p=s
date=2019-10-16/t=u/s=u/p=s
date=2019-10-17/t=u/s=u/p=s
date=2019-10-18/t=u/s=u/p=s
date=2019-10-19/t=u/s=u/p=s
Any idea why the number of partitions is that huge number? and how can I get number of partitions as expected (18)