I have inherited a Perl script that runs on an EC2 instance which basically crawls a bunch of URL's for data (aka scraping). The way this script is invoked is via a shell script that forks multiple of these perl scripts. There could be hundreds of these Perl scripts running at any given point depending on the scraping progress.
Each Perl script does this:
## create a temporary hash to hold everything ##
my %products = ();
and as you can imagine, that array grows as more products are scraped within that process.
My question is this: what happens when perl tries to add the next product to the 'product' array and there isn't memory available? Does it simply wait or does it die? My gut tells me it dies but how can I use a malloc style memory allocation where if it can't allocate memory it waits?
Is it better to to just limit the number of child processes?
Any ideas would be greatly appreciated.
p.s. This is perl, v5.10.1 (*) built for i486-linux-gnu-thread-multi