EMR does a lot of things for you that you won't find on standard Hadoop on EC2. Some particularly important ones include
- Copying Hadoop logs from your machines to S3. This is very useful for debugging errors after the cluster has been shut down.
- Running job flows of multiple MapReduce, Pig, or Hive jobs
- Setting sensible configuration defaults based on hardware size you choose
- Access to spot instances for cheaper compute
- Ability to resize clusters dynamically
You'll also find that the EMR S3 filesystem is faster and more reliable than the standard one packaged with Apache Hadoop. It supports Multipart upload, and streams writes directly to S3 rather than buffering to disk first. For a bit more on this, see Tip #5
Additionally, if you do decide to use EC2 directly, I'd recommend using instance-storage instead of EBS for your nodes. There's really no reason to pay the extra cost of EBS for Hadoop; you'll notice that EMR clusters all run on instance-storage nodes as well.