I am having a perl script which is killed by a automated job whenever a high priority process comes as my script is running ~ 300 parallel jobs for downloading data and is consuming lot of memory. I want to figure out how much is the memory it takes during run time so that I can ask for more memory before scheduling the script or if I get to know using some tool the portion in my code which takes up more memory, I can optimize the code for it.
Asked
Active
Viewed 281 times
0
-
1given the lack of information about your environment and program, the only thing I can suggest is `use less qw(memory);` – pavel Feb 13 '13 at 14:08
-
Environment is unix where a perl script to download data spans over 300 parallel children of another script which download data like products/hotels info and keep on appending it to a text file. Does continuous append of data to text file consumes more time and memory? What do you suggest in case total rows > 1 million. Should we append row by row or complete 1 million in one go by saving all data in some variable or append bunch of data? – Ravi Maggon Feb 13 '13 at 14:28
-
You may find these links helpful: [SO Question](http://stackoverflow.com/questions/9733146/tips-for-keeping-perl-memory-usage-low) and a [Perl Monks question](http://www.perlmonks.org/?node_id=666483) – Craig Treptow Feb 13 '13 at 14:55
-
1Three hundred threads seems excessive if they are all relying on your internet connection. Either they are mostly idle at any one time, or they are being held up by the available bandwidth. Either way it seems to me that you should reduce the number of threads by an order of magnitude. – Borodin Feb 13 '13 at 15:25
-
2Data written to a file doesn't consume memory. You should write it out and release the memory as quickly as possible. Saving a million lines of text in memory will make things much worse. – Borodin Feb 13 '13 at 15:27
-
@Borodin, if I don't run 300 parallel jobs for download, I run out of time to complete the data download in time. Sequential process takes 4 days to download while 300 processes together takes 8 hrs. All these 300 processes run on grid machines, so not an issue with Bandwidth. – Ravi Maggon Feb 13 '13 at 16:43
1 Answers
1
Regarding OP's comment on the question, if you want to minimize memory use, definitely collect and append the data one row/line at a time. If you collect all of it into a variable at once, that means you need to have all of it in memory at once.
Regarding the question itself, you may want to look into whether it's possible to have the Perl code just run once (rather than running 300 separate instances) and then fork
to create your individual worker processes. When you fork
, the child processes will share memory with the parent much more efficiently than is possible for unrelated processes, so you will, e.g., only need to have one copy of the Perl binary in memory rather than 300 copies.

Dave Sherohman
- 45,363
- 14
- 64
- 102