I want to make sure I understand EMR correctly. I'm wondering - does what I'm talking about make any sense with EMR / Hadoop?
I currently have a recommendation engine on my app that examines data stored in both MySQL and MongoDB (both on separate EC2 instances) and as a result can suggest content to users. This has worked fine, but now I we're at a point where it is now taking longer to execute the script than the intervals in which it should be running. This is obviously a problem.
I'm considering moving this script to EMR. I understand that I will be able to connect to MongoDB and MySQL from my mapping script (i.e. it doesn't need to be a file on S3). What I'm wondering is - if I start examining the data on MySQL / S3 - does Hadoop have some method of making sure that the script doesn't examine the same records on each instance? Do I understand the concept of Hadoop at all even? Sorry if this question is really noob.