My crawler is crawling all websites and getting metadata information from them. I then will run a script to sanitize the URLs and store them in Amazon RDS.
My problem is what datastore should I use to store data for sanitization purpose (Delete unwanted URLs). I don't want the crawler to hit the Amazon RDS which would slow it down.
Should I be using Amazon SimpleDB? Then I can read from SimpleDB, sanitize the URL and move it to Amazon RDS.