0

I am researching now in the topic of improving the MapReduce scheduler but unfortunately my university does not provide a cluster for research purposes. I was thinking about renting a cluster and I heard about Amazon EC2, but I have no experience with its services and I do not know how to use them.

I am in need of 5 machines with the following specifications (for each machine):

  • A dual-processor (2.2 GHz AMD Opteron(m) Processor 4122 with 4 physical cores)
  • 8GB of RAM
  • 500GB disk

I want to setup the Linux operating system and the Hadoop framework manually, just like I would if I had the machines physically on my hands. I would like to know if Amazon EC2 offers something like this, and I would like to estimate the cost of this infrastructure for, let's say, a month.

In the case I choose Amazon's Elastic MapReduce framework, would I be able to control de version of Hadoop? Could I also be able to change the configuration of the scheduler in it so that I can set my algorithm?

Finally, I would like to know if there is any kind of simulator for MapReduce to make different experiments.

Please excuse my multiple questions, I am new in this field and any guidance would be really appreciated.

Mikel Urkia
  • 2,087
  • 1
  • 23
  • 40
Flowra
  • 1,350
  • 2
  • 16
  • 19

1 Answers1

0

I was thinking about renting a cluster and I heard about Amazon EC2, but I have no experience with its services and I do not know how to use them.

Amazon's AWS has a elaborate documentation, for reference here is the Getting Started link to get you going. Also, AWS self-paced labs are worth checking out.

I am in need of 5 machines with the following specifications (for each machine): A dual-processor, 8GB of RAM, and 500GB of disk.

Amazon's AWS provides a wide range of EC2 instance types. Choose which one best fits your use-case from a list of instance types.

I want to setup the Linux operating system and the Hadoop framework manually, just like I would if I had the machines physically on my hands. I would like to know if Amazon EC2 offers something like this, and I would like to estimate the cost of this infrastructure for, let's say, a month.

AWS does not provide a VM without an OS installed in it. All the VM's provided by AWS are pre-loaded with OS and you could manually install Hadoop on top of that. Of course AWS provides a wide range of OS.

Amazon AWS also provides a Simple Monthly Calculator to calculate how much your cluster might cost based on the instances you have selected and number of EB2 volumes you have attached to each instance.

In the case I choose Amazon's Elastic MapReduce framework, would I be able to control de version of Hadoop? Could I also be able to change the configuration of the scheduler in it so that I can set my algorithm?

If you are using AWS EMR to deploy Hadoop cluster then you could select the version of Hadoop to be installed, supported Hadoop versions by Amazon are 2.4.0, 2.2.0, 1.0.3, 0.20.205.

Finally, I would like to know if there is any kind of simulator for MapReduce to make different experiments.

I did not understand about the mapreduce simulator part though.

Mikel Urkia
  • 2,087
  • 1
  • 23
  • 40
Ashrith
  • 6,745
  • 2
  • 29
  • 36
  • thanks so much for your answer,it clarifies many things to me , about the last question I hear about simulator program like that : https://code.google.com/p/mrsim/ – Flowra Dec 11 '14 at 09:01
  • There is not much documentation on `mrsim` project page to understand what id does. – Ashrith Dec 11 '14 at 10:07
  • I have allowed myself to edit both the question and your quotes to provide a more clear and concise syntax. – Mikel Urkia Jan 21 '15 at 08:37