I'm running a job on the whole commoncrawl corpus, and ran into this hiccup in the remaining map tasks. Anyone know what would cause that? I suspect it has something to do with this "resizing complete" message, but am not sure why the job would stop mapping, then back up like that. Thanks.
Asked
Active
Viewed 177 times
1
-
The [skipping](http://i.imgur.com/TwQFW.png) got worse as time went on. – kelorek Jan 17 '13 at 01:51
-
1Can you show more details about how you ran your job? My first guess would be that, since the CORE nodes are used for the storage of the cluster, your job is so huge that your cluster keeps running out of space and thus increases the number of CORE nodes. – Charles Menguy Jan 17 '13 at 02:46
-
Here are some details: 1 c1.xlarge for the master node, 99 c1.xlarges spot instances for the core nodes. No reducer, no combiner. Here's the [mapper](http://pastebin.com/RFJHRLfT) and the [main file](http://pastebin.com/fBLMCmJH). The cluster might have been running out of RAM as time went on... the rate of map jobs seems to slow down over time. There were a lot of java heap space errors. – kelorek Jan 17 '13 at 05:03
-
1I guess what I wanted to see is mainly how you run this job in EMR with the AWS SDK to see how you define your jobflow. Or are you running this through the web console? – Charles Menguy Jan 17 '13 at 05:17
-
1Oh so you're using spot instances ! That may be related to the resizing. – Charles Menguy Jan 17 '13 at 06:09
-
Does this answer your question? I'm using the [emr ruby client](http://aws.amazon.com/developertools/2264): elastic-mapreduce --create --alive --name awwyeah --instance-group master --instance-count 1 --instance-type c1.xlarge --bid-price 0.66 --instance-group core --instance-count 99 --instance-type c1.xlarge --bid-price 0.66 --bootstrap-action s3://elasticmapreduce/bootstrap-actions/install-ganglia && elastic-mapreduce --jobflow
--jar s3n://bucket/HelloWorld.jar --arg org.commoncrawl.tutorial.HelloWorld --arg – kelorek Jan 17 '13 at 06:09--arg --arg 2010/ --arg s3n://bucket/helloworld-out -
I thought that might be it too, but I had them the whole time and the price didn't change while I was using them. The CPU-bound instances have lower memory, so that's what I'm suspecting right now. I'll probably just break the job up into smaller jobs and combine the output files. Since I don't have a reducer that's an easy fix if I can't figure this out. – kelorek Jan 17 '13 at 06:54
-
Do you have access to the tasktracker logs? If your job is still running I assume not, so if you log in the jobtracker in amazon and look at the logs in jobtracker console (with lynx for example or a ssh tunnel), do you see any errors? I'm thinking you could also have memory issues. – Charles Menguy Jan 17 '13 at 06:58
-
Yeah I've been getting a ton of java heap errors, and occasional GC memory errors. I'm now considering removing Jsoup from my code and just using regular expressions to pull what I need. – kelorek Jan 17 '13 at 20:48