I'm developing a Facebook application that uses SimpleDB to store its data, but I've realized Amazon does not provide a way to backup that data (at least that I know of)
And SimpleDB is slow. You can get about 4 lists per second, each list of 100 records. Not a good way to backup tons of records.
I found some services in the web that offer to do the backup for you, but I'm not comfortable about giving them my AWS Credentials.
So I though about using threads. Problem is that if you do a select for all the keys in the domain, you need to wait for the next_token value of the first page in order to process the second page and so on.
A solution I was thinking for this was to have a new attribute based on the last 2 digits of the Facebook id. So I'd start a thread with a select for "00", another for "01", and so on, potentially having the possibility of running 100 threads and doing backups much faster (at least in theory). A related solution would be to split that domain into 100 domains (so I can backup each one individually), but that would break some of the selects I need to do. Another solution, probably more PHP friendly, would be to use a cron job to backup lets say 10,000 records and save "next_token", then the next job starts at next_token, etc.
Does anyone have a better solution for this? If its a PHP solution it'd be great, but if it involves something else its welcome anyway.
PS: before you mention it, as far as I know, PHP is still not thread safe. And I'm aware that unless I stop the writes during the backup, there will be some consistency problems, but I'm not too worried about it in this particular case.