I have an external Hive table as follows :-
CREATE external TABLE sales (
ItemNbr STRING,
itemShippedQty INT,
itemDeptNbr SMALLINT,
gateOutUserId STRING,
code VARCHAR(3),
trackingId STRING,
baseDivCode STRING
)
PARTITIONED BY (countryCode STRING, sourceNbr INT, date STRING)
STORED AS PARQUET
LOCATION '/user/sales/';
where table is partitioned by 3 columns ( countryCode, sourceNbr , date). I know that if i query based on these 3 partition columns, my query would be faster.
I have some queries on other query pattern :-
If i add non-partitioned column along with partitioned column like countryCode, sourceNbr , date , ItemNbr as part of where condition when executing sql query , will it scan the full table or it will scan only inside the folder based on countryCode, sourceNbr , date and look for itemNbr attribute value specified in where condition?
Giving all columns is necessary to filter the record or sub-filter also works like if i give only first 2 columns (countryCode, sourceNbr ) as part of where condition. In this case it would scan the full table or it would search only inside folders based on 2 columns condition (countryCode, sourceNbr ) ?