You can use below code to print the (logical and physical) plans.
import pyspark.sql
#create a df using your sql
df = sqlContext.sql("SELECT field1 AS f1, field2 as f2 from table1")
#use explain to see explain output. Without argument, you will get only physical plan
df.explain(True)
== Parsed Logical Plan ==
...
== Analyzed Logical Plan ==
...
== Optimized Logical Plan ==
...
== Physical Plan ==
...
EDIT : I ran explain for mytable and posted excerpt below. Which shows hive is choosing only few partitions(folders) and not going through all partitions. You should be able to see similar output.
here table is partitioned on part_col.
query used to generate this explain extended select * from mytab where part_col in (10,50)
.
sorry, I do not have spark installed so cant test it.
29
Path -> Alias:
30
hdfs://namenode:8020/user/hive/warehouse/tmp/part_col=10.0 [tmp]
31
hdfs://namenode:8020/user/hive/warehouse/tmp/part_col=50.0 [tmp]
32
Path -> Partition:
33
hdfs://namenode:8020/user/hive/warehouse/tmp/part_col=10.0
34
Partition
35
base file name: part_col=10.0
36
input format: org.apache.hadoop.mapred.TextInputFormat
37
...
hdfs://namenode:8020/user/hive/warehouse/tmp/part_col=50.0
85
Partition
86
base file name: part_col=50.0
87
input format: org.apache.hadoop.mapred.TextInputFormat
88
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
89
partition values:
90
college_marks 50.0