I am running a simple join query
select count(*) from t1 join t2 on t1.sno=t2.sno
However it's giving an out of memory exception.
Both t1
and t2
are in rcfile
format clustered and sorted on ssn and have 20 million records each. I have 3 data nodes with 4 GB ram and 2 CPU cores.
I have tried running a map join but its also failing.
The other configuration options are:
io.sort.mb- 500 MB
io.sort.factor- 10
fs.memory.size.mb- 500 MB
Please advise.