Originally it was a small project, just 150 accounts that I wrote a selenium program in python to do a small task with them. It used one computer and took about 5 hours. Now however, I am looking to scale to 1000 accounts. For obvious reasons I do not want to do this on one computer, this task needs to be done once per day and it would obviously take about 30-35 hours for this task to complete on one machine. I want to use more than one machine, but also want to have the option to easily scale to three, four, or more.
I have moved the data of all the accounts into an amazon cloud database, and can easily connect to it from my python program. However, as I mentioned earlier I want this project to be easily scalable. I do not want to hardcode values, aka have one computer do accounts 1-500 and the other do 501-1000 (What if I added 500 more accounts and 2 machines? I would want each machine to do 1500/4). I'm thinking of a master slave approach. Where on each machine I have a program that can be called with some number of accounts as an array. And a master program that runs on my machine that once per 24 hours can send out a command with the accounts each machine is supposed to utilize.
Then I want the program to return the data back to me and when each slave is finished the master program will combine the data sent back by each slave and update the table accordingly. OR each slave to update the table independently, but I am not sure this is possible due to table-locks (if anyone could maybe comment on this it would be helpful as well)
Thanks for reading!
Edit: If you think this is too broad I'm not looking for an exact answer. Just trying to find someone who has done anything like this before. Just listing a technology or method of doing this that I can research would help me a lot