I am researching now in the topic of improving the MapReduce scheduler but unfortunately my university does not provide a cluster for research purposes. I was thinking about renting a cluster and I heard about Amazon EC2, but I have no experience with its services and I do not know how to use them.
I am in need of 5 machines with the following specifications (for each machine):
- A dual-processor (2.2 GHz AMD Opteron(m) Processor 4122 with 4 physical cores)
- 8GB of RAM
- 500GB disk
I want to setup the Linux operating system and the Hadoop framework manually, just like I would if I had the machines physically on my hands. I would like to know if Amazon EC2 offers something like this, and I would like to estimate the cost of this infrastructure for, let's say, a month.
In the case I choose Amazon's Elastic MapReduce framework, would I be able to control de version of Hadoop? Could I also be able to change the configuration of the scheduler in it so that I can set my algorithm?
Finally, I would like to know if there is any kind of simulator for MapReduce to make different experiments.
Please excuse my multiple questions, I am new in this field and any guidance would be really appreciated.