4

I occasionally need to extract data from a large XML database export. The file size is between 600 and 700 MBs. With a few days of research I concluded that XML::Twig is the way to go, because its handlers allow me to process the file chunk-by-chunk.

I followed the examples, and with some hours of trial and error I wrote the Perl code that performed the job I needed to do. The code worked, it extracted the data I wanted and calculated the details I needed. But while running the script (took about 15 minutes to process the 620MBs XML) I noticed in Activity Monitor that towards the end the memory usage quickly increases to an unreasonably high level.

I removed the Perl code that processes the XML tag I was interested in and replaced it with a single instruction to increment a variable, simply counting how many of product elements had been found. The memory usage was similar. First normal, then towards the final few elements it started to rapidly increase and eat up everything I have.

I added some code to monitor time, the number of items processed, and how much memory my Perl process is using. This code is shown here:

#!/usr/bin/perl

use strict;
use utf8;

use XML::Twig;

binmode STDOUT, ":encoding(UTF-8)";
binmode STDERR, ":encoding(UTF-8)";

my $my_processID = $$;
my ($xml_file_name, $second_arg) = @ARGV;
unless (defined($xml_file_name)) { die "Please provide XML filename as 1st argument."; }

my $product_counter = 0;
my @shell_command_output_text;
my $resource_usage_status;
my $reference_timestamp;
my $datetimeNow;

my $t = XML::Twig->new(
    twig_handlers => { 'product' => \&do_what_needs_to_be_done_with_each_product },
    pretty_print  => 'indented'
);

$reference_timestamp = time();

$t->parsefile( $xml_file_name, ErrorContext => 2 );
print "(Parse finished)\n"; 
showstatus();
$t->purge;

print "(Purge finished)\n"; 
showstatus();
print "\nJOB COMPLETE.\n$product_counter items processed.\n";

exit 0;

sub do_what_needs_to_be_done_with_each_product() {
    my( $twig, $product)= @_;

    $product_counter++;

    ## Display status every 5 seconds:
    if ( ( time()-$reference_timestamp ) > 5 ) {
        showstatus();
        $reference_timestamp = time();
    }

    $twig->purge;
} ## end-sub


sub showstatus() {

    @shell_command_output_text = `top -l 1 -pid $my_processID -stats pid,command,cpu,mem`;

    ## I only need the last line of that output.
    $resource_usage_status = $shell_command_output_text[$#shell_command_output_text];
    $datetimeNow = localtime();
    chomp $resource_usage_status;

    print "Res: $resource_usage_status -> item count: $product_counter <- time: $datetimeNow\n";
} ## end-sub

And here is what was shown on the screen when I run the above script.

keve@deimos:DATA$ ./twigExample.pl product_db_extract.xml 
Res: 3237  perl5.18 0.0  12M+ -> item count: 3086 <- time: Sat Nov 18 23:45:48 2017
Res: 3237  perl5.18 0.0  12M+ -> item count: 6521 <- time: Sat Nov 18 23:45:55 2017
Res: 3237  perl5.18 0.0  12M+ -> item count: 9909 <- time: Sat Nov 18 23:46:02 2017
Res: 3237  perl5.18 0.0  12M+ -> item count: 12751 <- time: Sat Nov 18 23:46:09 2017
Res: 3237  perl5.18 0.0  12M+ -> item count: 15570 <- time: Sat Nov 18 23:46:16 2017
Res: 3237  perl5.18 0.0  12M+ -> item count: 18151 <- time: Sat Nov 18 23:46:23 2017
Res: 3237  perl5.18 0.0  12M+ -> item count: 20828 <- time: Sat Nov 18 23:46:30 2017
Res: 3237  perl5.18 0.0  12M+ -> item count: 23561 <- time: Sat Nov 18 23:46:37 2017
Res: 3237  perl5.18 0.0  12M+ -> item count: 26230 <- time: Sat Nov 18 23:46:44 2017
Res: 3237  perl5.18 0.0  12M+ -> item count: 28861 <- time: Sat Nov 18 23:46:51 2017
Res: 3237  perl5.18 0.0  12M+ -> item count: 31665 <- time: Sat Nov 18 23:46:58 2017
Res: 3237  perl5.18 0.0  13M+ -> item count: 34443 <- time: Sat Nov 18 23:47:05 2017
Res: 3237  perl5.18 0.0  13M+ -> item count: 36952 <- time: Sat Nov 18 23:47:12 2017
Res: 3237  perl5.18 0.0  13M+ -> item count: 39461 <- time: Sat Nov 18 23:47:19 2017
Res: 3237  perl5.18 0.0  13M+ -> item count: 42113 <- time: Sat Nov 18 23:47:26 2017
Res: 3237  perl5.18 0.0  13M+ -> item count: 44794 <- time: Sat Nov 18 23:47:33 2017
Res: 3237  perl5.18 0.0  13M+ -> item count: 47510 <- time: Sat Nov 18 23:47:40 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 50154 <- time: Sat Nov 18 23:47:47 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 52869 <- time: Sat Nov 18 23:47:54 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 55497 <- time: Sat Nov 18 23:48:01 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 57725 <- time: Sat Nov 18 23:48:08 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 59754 <- time: Sat Nov 18 23:48:15 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 61600 <- time: Sat Nov 18 23:48:22 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 63244 <- time: Sat Nov 18 23:48:29 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 64801 <- time: Sat Nov 18 23:48:36 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 66448 <- time: Sat Nov 18 23:48:43 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 68022 <- time: Sat Nov 18 23:48:50 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 69561 <- time: Sat Nov 18 23:48:57 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 71180 <- time: Sat Nov 18 23:49:04 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 73009 <- time: Sat Nov 18 23:49:11 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 75259 <- time: Sat Nov 18 23:49:18 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 77331 <- time: Sat Nov 18 23:49:25 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 79344 <- time: Sat Nov 18 23:49:32 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 81389 <- time: Sat Nov 18 23:49:39 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 83345 <- time: Sat Nov 18 23:49:46 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 85346 <- time: Sat Nov 18 23:49:53 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 87350 <- time: Sat Nov 18 23:50:00 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 89359 <- time: Sat Nov 18 23:50:07 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 91374 <- time: Sat Nov 18 23:50:14 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 93451 <- time: Sat Nov 18 23:50:21 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 95542 <- time: Sat Nov 18 23:50:28 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 97635 <- time: Sat Nov 18 23:50:35 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 99764 <- time: Sat Nov 18 23:50:42 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 101914 <- time: Sat Nov 18 23:50:49 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 103896 <- time: Sat Nov 18 23:50:56 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 105982 <- time: Sat Nov 18 23:51:03 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 108023 <- time: Sat Nov 18 23:51:10 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 110090 <- time: Sat Nov 18 23:51:17 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 112033 <- time: Sat Nov 18 23:51:24 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 114039 <- time: Sat Nov 18 23:51:31 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 116038 <- time: Sat Nov 18 23:51:38 2017
Res: 3237  perl5.18 0.0  15M+ -> item count: 118031 <- time: Sat Nov 18 23:51:45 2017
(Parse finished)
Res: 3237  perl5.18 0.0  3780M+ -> item count: 119335 <- time: Sat Nov 18 23:54:21 2017
(Purge finished)
Res: 3237  perl5.18 0.0  2634M+ -> item count: 119335 <- time: Sat Nov 18 23:54:27 2017

JOB COMPLETE.
119335 items processed.
keve@deimos:DATA$

Notice the 3 minutes gap where "Parse finished" is printed.

I expected to see the memory usage go up for the final few product tags, just like I have seen it in Activity Monitor. But the memory usage printed—up to the very last product tag—was a perfectly acceptable 15 MB. The tricky part is not shown in this output: only the missing 3 minutes time and the 3780M indicates that something odd happens right after the final product tag.

So I wrote a shell script to keep track of memory usage during those missing 3 minutes at the end too. Here is what I was running in a separate window while the perl script was processing the XML in its own window.

#!/bin/bash
while true
do
  perlStat=`top -l 1 -pid $1 -stats pid,command,cpu,mem,purg,vsize | grep perl5`
  echo "`date` ::> $perlStat"
  sleep 5
done

And here is the output of that shell script running while the perl script was processing the XML:

keve@deimos:DATA$ ./monitorTWIGprocess.sh 3237
2017 Nov 18 Szo 23:45:53 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:46:00 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:46:06 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:46:12 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:46:18 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:46:24 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:46:30 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:46:36 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:46:42 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:46:48 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:46:54 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:47:01 CET ::> 3237  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 18 Szo 23:47:07 CET ::> 3237  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 18 Szo 23:47:13 CET ::> 3237  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 18 Szo 23:47:19 CET ::> 3237  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 18 Szo 23:47:25 CET ::> 3237  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 18 Szo 23:47:31 CET ::> 3237  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 18 Szo 23:47:37 CET ::> 3237  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 18 Szo 23:47:43 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:47:49 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:47:56 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:48:02 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:48:08 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:48:14 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:48:20 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:48:26 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:48:32 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:48:38 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:48:44 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:48:51 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:48:57 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:49:03 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:49:09 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:49:15 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:49:21 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:49:27 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:49:33 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:49:39 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:49:46 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:49:52 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:49:58 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:50:04 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:50:10 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:50:16 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:50:22 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:50:28 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:50:35 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:50:41 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:50:47 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:50:53 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:50:59 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:51:05 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:51:11 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:51:17 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:51:23 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:51:29 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:51:36 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:51:42 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:51:48 CET ::> 3237  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 18 Szo 23:51:54 CET ::> 3237  perl5.18 0.0  125M+ 0B   N/A  
2017 Nov 18 Szo 23:52:00 CET ::> 3237  perl5.18 0.0  278M+ 0B   N/A  
2017 Nov 18 Szo 23:52:06 CET ::> 3237  perl5.18 0.0  430M+ 0B   N/A  
2017 Nov 18 Szo 23:52:12 CET ::> 3237  perl5.18 0.0  584M+ 0B   N/A  
2017 Nov 18 Szo 23:52:18 CET ::> 3237  perl5.18 0.0  737M+ 0B   N/A  
2017 Nov 18 Szo 23:52:24 CET ::> 3237  perl5.18 0.0  889M+ 0B   N/A  
2017 Nov 18 Szo 23:52:31 CET ::> 3237  perl5.18 0.0  1042M+ 0B   N/A  
2017 Nov 18 Szo 23:52:37 CET ::> 3237  perl5.18 0.0  1195M+ 0B   N/A  
2017 Nov 18 Szo 23:52:43 CET ::> 3237  perl5.18 0.0  1349M+ 0B   N/A  
2017 Nov 18 Szo 23:52:49 CET ::> 3237  perl5.18 0.0  1502M+ 0B   N/A  
2017 Nov 18 Szo 23:52:55 CET ::> 3237  perl5.18 0.0  1656M+ 0B   N/A  
2017 Nov 18 Szo 23:53:01 CET ::> 3237  perl5.18 0.0  1809M+ 0B   N/A  
2017 Nov 18 Szo 23:53:07 CET ::> 3237  perl5.18 0.0  1961M+ 0B   N/A  
2017 Nov 18 Szo 23:53:13 CET ::> 3237  perl5.18 0.0  2114M+ 0B   N/A  
2017 Nov 18 Szo 23:53:19 CET ::> 3237  perl5.18 0.0  2267M+ 0B   N/A  
2017 Nov 18 Szo 23:53:26 CET ::> 3237  perl5.18 0.0  2420M+ 0B   N/A  
2017 Nov 18 Szo 23:53:32 CET ::> 3237  perl5.18 0.0  2573M+ 0B   N/A  
2017 Nov 18 Szo 23:53:38 CET ::> 3237  perl5.18 0.0  2726M+ 0B   N/A  
2017 Nov 18 Szo 23:53:44 CET ::> 3237  perl5.18 0.0  2879M+ 0B   N/A  
2017 Nov 18 Szo 23:53:50 CET ::> 3237  perl5.18 0.0  3032M+ 0B   N/A  
2017 Nov 18 Szo 23:53:56 CET ::> 3237  perl5.18 0.0  3186M+ 0B   N/A  
2017 Nov 18 Szo 23:54:02 CET ::> 3237  perl5.18 0.0  3339M+ 0B   N/A  
2017 Nov 18 Szo 23:54:08 CET ::> 3237  perl5.18 0.0  3492M+ 0B   N/A  
2017 Nov 18 Szo 23:54:15 CET ::> 3237  perl5.18 0.0  3646M+ 0B   N/A  
2017 Nov 18 Szo 23:54:21 CET ::> 3237  perl5.18 0.0  3786M+ 0B   N/A  
2017 Nov 18 Szo 23:54:27 CET ::> 3237  perl5.18 0.0  3502M+ 0B   N/A  
2017 Nov 18 Szo 23:54:33 CET ::> 
^C
keve@deimos:DATA$

Notice the increase of memory usage at 23:51:54. That is when the 3-minute gap in the other output starts. And memory usage is rapidly going up from there.

I am puzzled as to what causes this rapid increase of memory usage at the end of processing the last (119,335th) product tag. And whatever it is, why does it use so much memory? 3.5G instead of the 15M, that is not OK.

Am I doing something wrong?

Any suggestion on how to avoid the memory usage increase?


Added later, in response to the answers my original post received.

Supplement
Switching to twig_roots.
I modified the twig handler block by commenting out the handler definition and adding its twig_roots variety instead, as shown here:

my $t = XML::Twig->new(
        ##twig_handlers => { 'product' => \&do_what_needs_to_be_done_with_each_product },
        twig_roots => { 'product' => \&do_what_needs_to_be_done_with_each_product },
        twig_print_outside_roots => 0,
        pretty_print => 'indented'
        );

I left the $twig->purge(); instruction inside my sub.

Here is the output of the perl script:

keve@deimos:DATA$ ./twigExample.pl product_db_extract.xml 
Res: 1591  perl5.18 0.0  12M+ -> item count: 2852 <- time: Sun Nov 19 15:20:51 2017
Res: 1591  perl5.18 0.0  12M+ -> item count: 6106 <- time: Sun Nov 19 15:20:58 2017
Res: 1591  perl5.18 0.0  12M+ -> item count: 9341 <- time: Sun Nov 19 15:21:05 2017
Res: 1591  perl5.18 0.0  12M+ -> item count: 12141 <- time: Sun Nov 19 15:21:12 2017
.
.
.
Res: 1591  perl5.18 0.0  15M+ -> item count: 112293 <- time: Sun Nov 19 15:26:41 2017
Res: 1591  perl5.18 0.0  15M+ -> item count: 114255 <- time: Sun Nov 19 15:26:48 2017
Res: 1591  perl5.18 0.0  15M+ -> item count: 116171 <- time: Sun Nov 19 15:26:55 2017
Res: 1591  perl5.18 0.0  15M+ -> item count: 118099 <- time: Sun Nov 19 15:27:02 2017
(Parse finished)
Res: 1591  perl5.18 0.0  15M+ -> item count: 119335 <- time: Sun Nov 19 15:27:56 2017
(Purge finished)
Res: 1591  perl5.18 0.0  15M+ -> item count: 119335 <- time: Sun Nov 19 15:27:57 2017

JOB COMPLETE.
119335 items processed.
keve@deimos:DATA$

And here is the output of the monitoring shell script:

keve@deimos:DATA$ ./monitorTWIGprocess.sh 1591
2017 Nov 19 Vas 15:20:57 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:21:03 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:21:09 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:21:16 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:21:22 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:21:28 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:21:34 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:21:40 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:21:46 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:21:52 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:21:58 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:22:04 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:22:10 CET ::> 1591  perl5.18 0.0  12M+ 0B   N/A  
2017 Nov 19 Vas 15:22:16 CET ::> 1591  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 19 Vas 15:22:22 CET ::> 1591  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 19 Vas 15:22:28 CET ::> 1591  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 19 Vas 15:22:35 CET ::> 1591  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 19 Vas 15:22:41 CET ::> 1591  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 19 Vas 15:22:47 CET ::> 1591  perl5.18 0.0  13M+ 0B   N/A  
2017 Nov 19 Vas 15:22:53 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:22:59 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:23:05 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:23:11 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:23:17 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:23:23 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:23:29 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:23:35 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:23:41 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:23:47 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:23:54 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:24:00 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:24:06 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:24:12 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:24:18 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:24:24 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:24:30 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:24:36 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:24:42 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:24:48 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:24:54 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:25:00 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:25:06 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:25:13 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:25:19 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:25:25 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:25:31 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:25:37 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:25:43 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:25:49 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:25:55 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:26:01 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:26:07 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:26:13 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:26:19 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:26:26 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:26:32 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:26:38 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:26:44 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:26:50 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:26:56 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:27:02 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:27:08 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:27:14 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:27:20 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:27:26 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:27:32 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:27:38 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:27:44 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:27:51 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:27:57 CET ::> 1591  perl5.18 0.0  15M+ 0B   N/A  
2017 Nov 19 Vas 15:28:03 CET ::> 
^C
keve@deimos:DATA$

The good news is that using the twig_roots approach instead of a twig handler makes my perl code behave as I wanted it. It goes through a giant XML file end extracts data from each product (the extracted data is not in XML format) while the amount of memory used stays below 20 megabytes. Splendid!
The bad news is that I am still unable to explain why using the handler ate up that much memory, and why it did that after the last product was read. Up to the last product, memory usage was perfectly fine. What made it go out of control at that point? I will make some more tests to try and satisfy my curiosity, but that is no longer tied to the issue of how my XML-processing perl code development moves forward. I am marking the twig_roots suggestion as an accepted answer, but I very much appreciate the other answer too as that one was equaly educational to work with.

Keve
  • 383
  • 2
  • 13
  • Please don't declare all your variables at the top of your program. They should be declared with the smallest possible scope, usually at the point where they are first defined as you have done with `my $t = ...` – Borodin Nov 19 '17 at 03:00
  • In particular, `$datetimeNow` should be declared within `showstatus` as `my $datetimeNow = localtime`. And you should avoid prototypes: both subroutines should be just `sub showstatus { ... }` etc. – Borodin Nov 19 '17 at 16:29
  • My habit of declaring variables at the top comes from other programming languages I use. I will try to pay attention to this when using perl. And I will also be more careful with my subs. – Keve Nov 20 '17 at 22:42
  • Yes, I assumed so. Presumably you're a C programmer? Either way, a big block of declarations is really just a way of getting the compilation errors to go away. Every modern language I can think of allows you to declare variables with a limited scope, and you should use the same principles everywhere. – Borodin Nov 20 '17 at 23:09
  • **mirod's** answer is not a solution; he was simply asking how removing the `purge` influenced the behaviour. However he is the author of `XML::Twig`, and if you want an explanation for the high memory usage then he is your best bet. You should show a sample of the XML you are dealing with, including at least a few `product` elements. – Borodin Nov 20 '17 at 23:21

2 Answers2

4

You're building a complete XML data structure except for product elements, only to discard it all at the end

You don't say what you actually want to do with the data, but if you provide twig_handlers then XML::Twig is expecting to produce a filtered/modified version of the input data

If you don't want any output XML at all, or you want to build your own rather than modifying the input, then I suggest you use twig_roots instead, like this

This is untested code. I'm travelling at present and nowhere near a PC system to test with

my $t = XML::Twig->new(
    twig_roots               => { product => \&do_what_needs_to_be_done_with_each_product },
    twig_print_outside_roots => 0,
    pretty_print             => 'indented',
); 

Note that do_what_needs_to_be_done_with_each_product (I hope that's not the real identifier) shouldn't contain $t->purge when working like this

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • 1
    The twig_roots way seems to be the one I need. I modified the perl script the way you suggested (product in single-quotes and no comma tailing the pretty_print line) and it does what I originally expected it to do while memory usage does not go above 15M. When I studied the XML::Twig documentation I did read the twig_roots section but I did not understand its purpose. More precisely, I did not recognize that this is the approach I need. Now I concluded that twig_roots does what I believed the twig handler to do. Thanks for pointing me to the right direction! – Keve Nov 19 '17 at 15:28
  • @Keve: I'm pleased to help. I agree that the documentation isn't the clearest. Has the three-minute gap also disappeared? And you do know that those quotes and the comma are unnecessary, right? – Borodin Nov 19 '17 at 16:06
  • Yes, the 3 minute gap vanished. I added the new results to my original post. I also have results on Mirod's suggestion, but cannot add to my post because of character limits ( makes my post too long). No, I was not aware that the quotes are insignificant there, or the comma may be left there. So this is another new thing I learnt today. :) – Keve Nov 20 '17 at 22:48
  • @Keve: That sounds like the right behaviour! Is eight minutes sufficiently fast? There's no reason to put quotes around `product` because the "fat comma" implicitly quotes any string of "word characters" (alphanumeric and underscore) to its left. If you wanted something containing spaces or punctuation then it would need quotes. Otherwise `=>` is identical to a normal comma. And Perl allows commas after the last element of a list so that you can easily add a new entry without modifying the original last line. – Borodin Nov 20 '17 at 23:14
2

It looks like memory usage increases dramatically during the final purge, the one after the end of parsing.

What happens when you don't purge?


Below is what happens when the $twig->purge(); line is removed/commented-out.
Output of the perl script processing the XML:

keve@deimos:DATA$ ./twigExample.pl product_db_extract.xml 
Res: 496  perl5.18 0.0  144M+ -> item count: 2905 <- time: Sun Nov 19 13:23:04 2017
Res: 496  perl5.18 0.0  291M+ -> item count: 6341 <- time: Sun Nov 19 13:23:11 2017
Res: 496  perl5.18 0.0  439M+ -> item count: 9777 <- time: Sun Nov 19 13:23:18 2017
Res: 496  perl5.18 0.0  587M+ -> item count: 12641 <- time: Sun Nov 19 13:23:25 2017
Res: 496  perl5.18 0.0  735M+ -> item count: 15528 <- time: Sun Nov 19 13:23:32 2017
Res: 496  perl5.18 0.0  883M+ -> item count: 18137 <- time: Sun Nov 19 13:23:39 2017
Res: 496  perl5.18 0.0  1032M+ -> item count: 20852 <- time: Sun Nov 19 13:23:46 2017
Res: 496  perl5.18 0.0  1179M+ -> item count: 23596 <- time: Sun Nov 19 13:23:53 2017
Res: 496  perl5.18 0.0  1328M+ -> item count: 26297 <- time: Sun Nov 19 13:24:00 2017
Res: 496  perl5.18 0.0  1476M+ -> item count: 28929 <- time: Sun Nov 19 13:24:07 2017
Res: 496  perl5.18 0.0  1624M+ -> item count: 31752 <- time: Sun Nov 19 13:24:14 2017
Res: 496  perl5.18 0.0  1772M+ -> item count: 34536 <- time: Sun Nov 19 13:24:21 2017
Res: 496  perl5.18 0.0  1920M+ -> item count: 37019 <- time: Sun Nov 19 13:24:28 2017
Res: 496  perl5.18 0.0  2068M+ -> item count: 39566 <- time: Sun Nov 19 13:24:35 2017
Res: 496  perl5.18 0.0  2216M+ -> item count: 42246 <- time: Sun Nov 19 13:24:42 2017
Res: 496  perl5.18 0.0  2365M+ -> item count: 44976 <- time: Sun Nov 19 13:24:49 2017
Res: 496  perl5.18 0.0  2513M+ -> item count: 47722 <- time: Sun Nov 19 13:24:56 2017
Res: 496  perl5.18 0.0  2662M+ -> item count: 50389 <- time: Sun Nov 19 13:25:03 2017
Res: 496  perl5.18 0.0  2810M+ -> item count: 53083 <- time: Sun Nov 19 13:25:10 2017
Res: 496  perl5.18 0.0  2958M+ -> item count: 55745 <- time: Sun Nov 19 13:25:17 2017
Res: 496  perl5.18 0.0  3106M+ -> item count: 57955 <- time: Sun Nov 19 13:25:24 2017
Res: 496  perl5.18 0.0  3253M+ -> item count: 59924 <- time: Sun Nov 19 13:25:31 2017
Res: 496  perl5.18 0.0  3402M+ -> item count: 61742 <- time: Sun Nov 19 13:25:38 2017
Res: 496  perl5.18 0.0  3550M+ -> item count: 63373 <- time: Sun Nov 19 13:25:45 2017
Res: 496  perl5.18 0.0  3698M+ -> item count: 64954 <- time: Sun Nov 19 13:25:52 2017
Res: 496  perl5.18 0.0  3845M+ -> item count: 66589 <- time: Sun Nov 19 13:25:59 2017
Res: 496  perl5.18 0.0  3993M+ -> item count: 68165 <- time: Sun Nov 19 13:26:06 2017
Res: 496  perl5.18 0.0  4141M+ -> item count: 69717 <- time: Sun Nov 19 13:26:13 2017
Res: 496  perl5.18 0.0  4289M+ -> item count: 71320 <- time: Sun Nov 19 13:26:20 2017
Res: 496  perl5.18 0.0  4380M+ -> item count: 73202 <- time: Sun Nov 19 13:26:27 2017
Res: 496  perl5.18 0.0  4384M+ -> item count: 75413 <- time: Sun Nov 19 13:26:34 2017
Res: 496  perl5.18 0.0  4280M+ -> item count: 77457 <- time: Sun Nov 19 13:26:41 2017
Res: 496  perl5.18 0.0  4181M+ -> item count: 79444 <- time: Sun Nov 19 13:26:48 2017
Res: 496  perl5.18 0.0  4094M+ -> item count: 81443 <- time: Sun Nov 19 13:26:55 2017
Res: 496  perl5.18 0.0  3939M+ -> item count: 83351 <- time: Sun Nov 19 13:27:02 2017
Res: 496  perl5.18 0.0  3839M+ -> item count: 85322 <- time: Sun Nov 19 13:27:09 2017
Res: 496  perl5.18 0.0  3727M+ -> item count: 87307 <- time: Sun Nov 19 13:27:16 2017
Res: 496  perl5.18 0.0  3629M+ -> item count: 89305 <- time: Sun Nov 19 13:27:23 2017
Res: 496  perl5.18 0.0  3521M+ -> item count: 91275 <- time: Sun Nov 19 13:27:30 2017
Res: 496  perl5.18 0.0  3418M+ -> item count: 93298 <- time: Sun Nov 19 13:27:37 2017
Res: 496  perl5.18 0.0  3299M+ -> item count: 95339 <- time: Sun Nov 19 13:27:44 2017
Res: 496  perl5.18 0.0  3189M+ -> item count: 97398 <- time: Sun Nov 19 13:27:51 2017
Res: 496  perl5.18 0.0  3141M+ -> item count: 99449 <- time: Sun Nov 19 13:27:58 2017
Res: 496  perl5.18 0.0  2977M+ -> item count: 101543 <- time: Sun Nov 19 13:28:05 2017
Res: 496  perl5.18 0.0  2857M+ -> item count: 103519 <- time: Sun Nov 19 13:28:12 2017
Res: 496  perl5.18 0.0  2748M+ -> item count: 105574 <- time: Sun Nov 19 13:28:19 2017
Res: 496  perl5.18 0.0  2643M+ -> item count: 107550 <- time: Sun Nov 19 13:28:26 2017
Res: 496  perl5.18 0.0  2591M+ -> item count: 109574 <- time: Sun Nov 19 13:28:33 2017
Res: 496  perl5.18 0.0  2506M+ -> item count: 111562 <- time: Sun Nov 19 13:28:40 2017
Res: 496  perl5.18 0.0  2389M+ -> item count: 113485 <- time: Sun Nov 19 13:28:47 2017
Res: 496  perl5.18 0.0  2280M+ -> item count: 115455 <- time: Sun Nov 19 13:28:54 2017
Res: 496  perl5.18 0.0  2100M+ -> item count: 117391 <- time: Sun Nov 19 13:29:01 2017
(Parse finished)
Res: 496  perl5.18 0.0  893M+ -> item count: 119335 <- time: Sun Nov 19 13:31:41 2017
(Purge finished)
Res: 496  perl5.18 0.0  5144M+ -> item count: 119335 <- time: Sun Nov 19 13:32:27 2017

JOB COMPLETE.
119335 items processed.
keve@deimos:DATA$

Output of the shell script monitoring the memory usage of the perl script:

keve@deimos:DATA$ ./monitorTWIGprocess.sh 496
2017 Nov 19 Vas 13:23:11 CET ::> 496  perl5.18 0.0  291M+ 0B   N/A  
2017 Nov 19 Vas 13:23:17 CET ::> 496  perl5.18 0.0  434M+ 0B   N/A  
2017 Nov 19 Vas 13:23:23 CET ::> 496  perl5.18 0.0  559M+ 0B   N/A  
2017 Nov 19 Vas 13:23:30 CET ::> 496  perl5.18 0.0  684M+ 0B   N/A  
2017 Nov 19 Vas 13:23:36 CET ::> 496  perl5.18 0.0  810M+ 0B   N/A  
2017 Nov 19 Vas 13:23:42 CET ::> 496  perl5.18 0.0  936M+ 0B   N/A  
2017 Nov 19 Vas 13:23:48 CET ::> 496  perl5.18 0.0  1061M+ 0B   N/A  
2017 Nov 19 Vas 13:23:54 CET ::> 496  perl5.18 0.0  1185M+ 0B   N/A  
2017 Nov 19 Vas 13:24:00 CET ::> 496  perl5.18 0.0  1328M+ 0B   N/A  
2017 Nov 19 Vas 13:24:06 CET ::> 496  perl5.18 0.0  1463M+ 0B   N/A  
2017 Nov 19 Vas 13:24:12 CET ::> 496  perl5.18 0.0  1589M+ 0B   N/A  
2017 Nov 19 Vas 13:24:18 CET ::> 496  perl5.18 0.0  1715M+ 0B   N/A  
2017 Nov 19 Vas 13:24:24 CET ::> 496  perl5.18 0.0  1840M+ 0B   N/A  
2017 Nov 19 Vas 13:24:30 CET ::> 496  perl5.18 0.0  1964M+ 0B   N/A  
2017 Nov 19 Vas 13:24:36 CET ::> 496  perl5.18 0.0  2090M+ 0B   N/A  
2017 Nov 19 Vas 13:24:43 CET ::> 496  perl5.18 0.0  2216M+ 0B   N/A  
2017 Nov 19 Vas 13:24:49 CET ::> 496  perl5.18 0.0  2365M+ 0B   N/A  
2017 Nov 19 Vas 13:24:55 CET ::> 496  perl5.18 0.0  2494M+ 0B   N/A  
2017 Nov 19 Vas 13:25:01 CET ::> 496  perl5.18 0.0  2620M+ 0B   N/A  
2017 Nov 19 Vas 13:25:07 CET ::> 496  perl5.18 0.0  2745M+ 0B   N/A  
2017 Nov 19 Vas 13:25:13 CET ::> 496  perl5.18 0.0  2870M+ 0B   N/A  
2017 Nov 19 Vas 13:25:19 CET ::> 496  perl5.18 0.0  2996M+ 0B   N/A  
2017 Nov 19 Vas 13:25:25 CET ::> 496  perl5.18 0.0  3121M+ 0B   N/A  
2017 Nov 19 Vas 13:25:31 CET ::> 496  perl5.18 0.0  3253M+ 0B   N/A  
2017 Nov 19 Vas 13:25:37 CET ::> 496  perl5.18 0.0  3399M+ 0B   N/A  
2017 Nov 19 Vas 13:25:43 CET ::> 496  perl5.18 0.0  3524M+ 0B   N/A  
2017 Nov 19 Vas 13:25:50 CET ::> 496  perl5.18 0.0  3649M+ 0B   N/A  
2017 Nov 19 Vas 13:25:56 CET ::> 496  perl5.18 0.0  3774M+ 0B   N/A  
2017 Nov 19 Vas 13:26:02 CET ::> 496  perl5.18 0.0  3899M+ 0B   N/A  
2017 Nov 19 Vas 13:26:08 CET ::> 496  perl5.18 0.0  4024M+ 0B   N/A  
2017 Nov 19 Vas 13:26:14 CET ::> 496  perl5.18 0.0  4149M+ 0B   N/A  
2017 Nov 19 Vas 13:26:20 CET ::> 496  perl5.18 0.0  4289M+ 0B   N/A  
2017 Nov 19 Vas 13:26:26 CET ::> 496  perl5.18 0.0  4370M+ 0B   N/A  
2017 Nov 19 Vas 13:26:32 CET ::> 496  perl5.18 0.0  4409M+ 0B   N/A  
2017 Nov 19 Vas 13:26:38 CET ::> 496  perl5.18 0.0  4319M+ 0B   N/A  
2017 Nov 19 Vas 13:26:44 CET ::> 496  perl5.18 0.0  4231M+ 0B   N/A  
2017 Nov 19 Vas 13:26:51 CET ::> 496  perl5.18 0.0  4152M+ 0B   N/A  
2017 Nov 19 Vas 13:26:57 CET ::> 496  perl5.18 0.0  4079M+ 0B   N/A  
2017 Nov 19 Vas 13:27:03 CET ::> 496  perl5.18 0.0  3934M+ 0B   N/A  
2017 Nov 19 Vas 13:27:09 CET ::> 496  perl5.18 0.0  3839M+ 0B   N/A  
2017 Nov 19 Vas 13:27:15 CET ::> 496  perl5.18 0.0  3741M+ 0B   N/A  
2017 Nov 19 Vas 13:27:21 CET ::> 496  perl5.18 0.0  3654M+ 0B   N/A  
2017 Nov 19 Vas 13:27:27 CET ::> 496  perl5.18 0.0  3566M+ 0B   N/A  
2017 Nov 19 Vas 13:27:33 CET ::> 496  perl5.18 0.0  3473M+ 0B   N/A  
2017 Nov 19 Vas 13:27:39 CET ::> 496  perl5.18 0.0  3377M+ 0B   N/A  
2017 Nov 19 Vas 13:27:45 CET ::> 496  perl5.18 0.0  3286M+ 0B   N/A  
2017 Nov 19 Vas 13:27:51 CET ::> 496  perl5.18 0.0  3189M+ 0B   N/A  
2017 Nov 19 Vas 13:27:58 CET ::> 496  perl5.18 0.0  3141M+ 0B   N/A  
2017 Nov 19 Vas 13:28:04 CET ::> 496  perl5.18 0.0  2987M+ 0B   N/A  
2017 Nov 19 Vas 13:28:10 CET ::> 496  perl5.18 0.0  2881M+ 0B   N/A  
2017 Nov 19 Vas 13:28:16 CET ::> 496  perl5.18 0.0  2796M+ 0B   N/A  
2017 Nov 19 Vas 13:28:22 CET ::> 496  perl5.18 0.0  2707M+ 0B   N/A  
2017 Nov 19 Vas 13:28:28 CET ::> 496  perl5.18 0.0  2615M+ 0B   N/A  
2017 Nov 19 Vas 13:28:34 CET ::> 496  perl5.18 0.0  2597M+ 0B   N/A  
2017 Nov 19 Vas 13:28:40 CET ::> 496  perl5.18 0.0  2499M+ 0B   N/A  
2017 Nov 19 Vas 13:28:46 CET ::> 496  perl5.18 0.0  2395M+ 0B   N/A  
2017 Nov 19 Vas 13:28:53 CET ::> 496  perl5.18 0.0  2310M+ 0B   N/A  
2017 Nov 19 Vas 13:28:59 CET ::> 496  perl5.18 0.0  2208M+ 0B   N/A  
2017 Nov 19 Vas 13:29:05 CET ::> 496  perl5.18 0.0  2052M+ 0B   N/A  
2017 Nov 19 Vas 13:29:11 CET ::> 496  perl5.18 0.0  1943M+ 0B   N/A  
2017 Nov 19 Vas 13:29:17 CET ::> 496  perl5.18 0.0  1832M+ 0B   N/A  
2017 Nov 19 Vas 13:29:23 CET ::> 496  perl5.18 0.0  1761M+ 0B   N/A  
2017 Nov 19 Vas 13:29:29 CET ::> 496  perl5.18 0.0  1669M+ 0B   N/A  
2017 Nov 19 Vas 13:29:35 CET ::> 496  perl5.18 0.0  1563M+ 0B   N/A  
2017 Nov 19 Vas 13:29:41 CET ::> 496  perl5.18 0.0  1452M+ 0B   N/A  
2017 Nov 19 Vas 13:29:47 CET ::> 496  perl5.18 0.0  1340M+ 0B   N/A  
2017 Nov 19 Vas 13:29:54 CET ::> 496  perl5.18 0.0  1275M+ 0B   N/A  
2017 Nov 19 Vas 13:30:00 CET ::> 496  perl5.18 0.0  1231M+ 0B   N/A  
2017 Nov 19 Vas 13:30:06 CET ::> 496  perl5.18 0.0  1142M+ 0B   N/A  
2017 Nov 19 Vas 13:30:12 CET ::> 496  perl5.18 0.0  1082M+ 0B   N/A  
2017 Nov 19 Vas 13:30:18 CET ::> 496  perl5.18 0.0  1042M+ 0B   N/A  
2017 Nov 19 Vas 13:30:24 CET ::> 496  perl5.18 0.0  1000M+ 0B   N/A  
2017 Nov 19 Vas 13:30:30 CET ::> 496  perl5.18 0.0  958M+ 0B   N/A  
2017 Nov 19 Vas 13:30:36 CET ::> 496  perl5.18 0.0  977M+ 0B   N/A  
2017 Nov 19 Vas 13:30:42 CET ::> 496  perl5.18 0.0  941M+ 0B   N/A  
2017 Nov 19 Vas 13:30:48 CET ::> 496  perl5.18 0.0  924M+ 0B   N/A  
2017 Nov 19 Vas 13:30:55 CET ::> 496  perl5.18 0.0  956M+ 0B   N/A  
2017 Nov 19 Vas 13:31:01 CET ::> 496  perl5.18 0.0  895M+ 0B   N/A  
2017 Nov 19 Vas 13:31:07 CET ::> 496  perl5.18 0.0  911M+ 0B   N/A  
2017 Nov 19 Vas 13:31:13 CET ::> 496  perl5.18 0.0  895M+ 0B   N/A  
2017 Nov 19 Vas 13:31:19 CET ::> 496  perl5.18 0.0  876M+ 0B   N/A  
2017 Nov 19 Vas 13:31:25 CET ::> 496  perl5.18 0.0  825M+ 0B   N/A  
2017 Nov 19 Vas 13:31:31 CET ::> 496  perl5.18 0.0  812M+ 0B   N/A  
2017 Nov 19 Vas 13:31:37 CET ::> 496  perl5.18 0.0  812M+ 0B   N/A  
2017 Nov 19 Vas 13:31:43 CET ::> 496  perl5.18 0.0  987M+ 0B   N/A  
2017 Nov 19 Vas 13:31:49 CET ::> 496  perl5.18 0.0  2058M+ 0B   N/A  
2017 Nov 19 Vas 13:31:56 CET ::> 496  perl5.18 0.0  2947M+ 0B   N/A  
2017 Nov 19 Vas 13:32:02 CET ::> 496  perl5.18 0.0  3178M+ 0B   N/A  
2017 Nov 19 Vas 13:32:08 CET ::> 496  perl5.18 0.0  3301M+ 0B   N/A  
2017 Nov 19 Vas 13:32:14 CET ::> 496  perl5.18 0.0  3429M+ 0B   N/A  
2017 Nov 19 Vas 13:32:20 CET ::> 496  perl5.18 0.0  4247M+ 0B   N/A  
2017 Nov 19 Vas 13:32:26 CET ::> 496  perl5.18 0.0  4674M+ 0B   N/A  
2017 Nov 19 Vas 13:32:32 CET ::> 496  perl5.18 0.0  5362M+ 0B   N/A  
2017 Nov 19 Vas 13:32:38 CET ::> 
^C
keve@deimos:DATA$

It did not surprise me that the memory usage is constantly increasing. But the output still showed some unexpected results. I don't understand why memory allocation stops at 13:26:57 and why it starts to release memory from that point on until it goes down to 987M while the script is actually reading and processing more and more product tags. Also, why does memory usage go back up again when the code finally reaches the $t->purge(); line?

Note: I also tested the script with the line modified as $twig->purge() unless ($product_counter > 119334);, which would NOT purge the last (119335th) product, but would still purge all the others before it. I am intentionally not posting that output as it is almost identical to my original output.

Keve
  • 383
  • 2
  • 13
mirod
  • 15,923
  • 3
  • 45
  • 65
  • The memory usage is already increased (3780M instead of 15M) when the "Parse finished" text is shown. That is before the final Purge is called. This suggests that the memory-use increase happens at/around the purge of the last product element. I modified the perl script, first by commenting out the purge instruction, then also tried to skip the purge for the last product only. – Keve Nov 19 '17 at 15:17
  • Mirod, I also generated output using your suggestions, but I could not post it here because it makes my post too long. – Keve Nov 20 '17 at 22:49
  • While I can no longer extend my original post with more output, it seems I could edit your answer and add the output there. – Keve Nov 21 '17 at 07:50
  • Is there more data after the last `product` element? That would explain the behavior of the code: using `twig_handlers` the tree is built for the data after the element, with `twig_root` it isn't. If that's the case, and if you can identify when you're past the last `product` you could use a call to `finish` (finishes parsing without checking if a `product` is found) or `finish_now` (even faster since it just kills the parser,) – mirod Nov 21 '17 at 08:12