Hadoop disk usage (intermediate reduce)

Asked Oct 29 '13 at 19:54

Active Oct 29 '13 at 19:54

Viewed 465 times

I' new in Hadoop, I'm using a cluster and I have a disk quote of 15GB. If I try to execute the wordcount sample on a big dataset (about 25GB) I receveid always the exception "The DiskSpace quota of xxxx is exceeded: ".

I checked my disk usage after exception and it is so far from the quote. Is this due to the temporary files or intermediate jobs? Is possible to delete temporary/intermediate files?

(I can modifify the configuration by Java code, I have no access directy to the .xml configaration file)

Thanks! ;)

asked Oct 29 '13 at 19:54

user2933702

I would increase the disk quota. 15 GB isn't big data. You can fit this in the main memory of a PC these days. Most likely you are prevented from trying to use 25 GB which is why you are not using even close to that amount. – Peter Lawrey Oct 29 '13 at 20:10
What is your usage before you start the job? How much data are you outputting? If the key and value are small (a single word and a single int) it should not result in large temporary files. These temporary files are deleted once the job completes. – John B Oct 29 '13 at 20:32

Hadoop disk usage (intermediate reduce)

0 Answers0