I am running a simple join query
select count(*) from t1 join t2 on t1.sno=t2.sno
Table t1 and t2 both have 20 million records each and column sno is of string data type.
The table data is imported in to HDFS from Amazon s3 in rcfile format. The query took 109s with 15 Amazon large instances however it takes 42sec on sql server with 16 GB RAM and 16 cpu cores.
Am I missing anything? Can't understand why am I getting slow performance on Amazon?