I am trying to query huge data from postgres using SparkSQl. I do see 100 partionion on the query stage , however , there is only one query running and only one executer executing.
Code :
df = sqlcontext.read.format('jdbc').options(url=params['url']
,driver=params["driver"]
,dbtable=tableName
,user=params['user']
,password=params['password']
,numPartitions=numberOfPartitions
,partitionColumn=partitionC
,lowerBound=lowerB
,upperBound=upperB).load()
The partitionC
is of type date, I even tried a similar thing on a numeric column. I also made sure that the data is well balanced.
How to make spark execute multiple queries to postgres?
already referred to SparkSQL PostgresQL Dataframe partitions