9

I wish to make a Perl program of mine use multiple cores. It progressively reads query input and compares chunks of that against a read-only data-structure loaded from file into memory for each run. That data-structure, which is typically a few giga-bytes, is a small set of packed strings that are used in small C-routines. When processes are forked, everything is copied, which on a multi-core machine quickly blows the RAM. I tried several non-standard modules, but all leads to slowness and/or blows the RAM. I thought, for read-only data, that Perl would not insist on making copies. Other languages can do it. Does anyone have ideas?

ЯegDwight
  • 24,821
  • 10
  • 45
  • 52
Niels Larsen
  • 119
  • 4
  • 1
    This discussion looks interesting: http://stackoverflow.com/questions/9733146/tips-for-keeping-perl-memory-usage-low In particular, someone suggested mounting some RAM as a hard drive, then using file I/O from there. That would solve your problem. The question is: is it worth it? – dan1111 Oct 03 '12 at 14:20
  • Thanks, using a file is an option. But I think the children will then spin the disk so much that multi-core speed becomes worse than single-core with all in ram. I say this, because I had an earlier version of the program that used a file, and it was 20x slower. One of the modules uses Sockets to make the memory "shared", but that too slows things a lot. – Niels Larsen Oct 03 '12 at 14:52
  • Note that dan1111 is not suggesting you use actual files. Disk spin doesn't enter into this solution. – darch Oct 03 '12 at 16:01
  • Ah yes, I misread. But making a ramfs is OS specific, and I also want the program to run as a regular user. I might try Sys::Mmap. – Niels Larsen Oct 03 '12 at 21:52
  • What does "blows the RAM" mean? I couldn't find anything relevant for that phrase on Google. – Starfish Oct 08 '12 at 17:38
  • @Starfish This is just a phrase for "the program consumes more memory than available on the system". Usually, Perl will terminate with an "out of memory" message in this case. – amon Oct 08 '12 at 17:50

3 Answers3

2

Fork doesn't normally copy memory until it's modified (search for copy on write or COW). Are you sure you are measuring memory usage correctly? Subtract before/after values from free rather than using top.

EDIT - example script

Try running the following with settings like: ./fork_mem_usage 5 10000 ./fork_mem_usage 25 10000 ./fork_mem_usage 5 100000 ./fork_mem_usage 25 100000

If the first increase is bigger than the subsequent ones then fork is using copy-on-write. It almost certainly is (except for Windows of course).

#!/usr/bin/perl
use strict;
use warnings;

my $num_kids  = shift @ARGV;
my $arr_size  = shift @ARGV;
print "$num_kids x $arr_size\n";

my @big_array = ('abcdefg') x $arr_size;
die "Array wrong length" unless ($arr_size == @big_array);

print_mem_usage('Start');

for my $i (1..$num_kids) {
    my $pid = fork();
    if ($pid) {
        if ($i % 5 == 0) {
            print_mem_usage($i);
        }
    }
    else {
        sleep(5);
        exit;
    }
}

print_mem_usage('End');
exit;

sub print_mem_usage {
    my $msg = shift;
    print "$msg: ";
    system q(free -m | grep buffers/cache | awk '{print $3}');
}
Richard Huxton
  • 21,516
  • 3
  • 39
  • 51
  • Yes, I get 'out of memory' error, copies are made. Suggestions welcome how to avoid those. I tried all I can find. – Niels Larsen Oct 03 '12 at 20:29
  • Thanks for the deluxe response. ./fork_mem_usage 25 100000 gives: 25 x 100000 Start: 320 5: 322 10: 322 15: 322 20: 323 25: 323 End: 323 – Niels Larsen Oct 04 '12 at 13:01
  • So the first one uses ~ 2MB, the rest less than 1 each. Whatever your problem is, it's not fork copying read-only memory. – Richard Huxton Oct 04 '12 at 13:07
  • 2
    Indeed, right you are .. On my old Lenovo laptop with 2 gb ram total, any number of 600 mb children can be launched. I changed your test to a hash of strings, which resembles my data structure more, and took random sub-strings in the children. I am grateful you put it under my nose, and I'm sure i tested plain fork, but must have made a mistake. On my stone I shall put "He who never checked properly". – Niels Larsen Oct 04 '12 at 14:24
0

Edit & Summary:

I have been terribly wrong about threads::shared being an option. Upon thread creation, even shared data structures are copied. This does indeed suck, and I can therefore summarize that Perl is completely incapable for memory-intensive computations.


When a process forks, the kernel copies the whole process. Everything that resides in RAM is duplicated. No language can work around that. However, you can try memory mapping, or you can use threads.

Perl threads are a fork emulation, but you can declare variables as shared between threads:

use threads;
use threads::shared;

my $sharedVariable :shared = 0;

my @worker;

for my $i (1 .. 6) {
   push @worker, threads->create(\&worker_sub);
}

$_->join() foreach @worker;

sub worker_sub {
   sleep rand 5;
   print $sharedVariable, "\n";
}

If the $sharedVariable is updated in one thread, the change propagates to the other threads as well. You can see this if you substitute the print statement with

print threads->tid, "-->", ++$sharedVariable, "\n";
amon
  • 57,091
  • 2
  • 89
  • 149
  • As I understand it, threads won't use multiple cores any better than a standard Perl program--precisely because they are an emulation rather than the real thing. Correct me if I'm wrong, though. – dan1111 Oct 03 '12 at 14:25
  • @dan1111 If you feel like it, run a `$i += 1 while 1` loop for five seconds in multiple threads and look at your CPU usage ;-) When the OS makes threads available, Perl will use them. Perl can use multiple cores on my Linux system. – amon Oct 03 '12 at 14:32
  • Thank you .. yes i tried the forks module, which supposedly is leaner on RAM than threads, but that added slowness and it looked (with top) as if the memory was duplicated. Other languages do provide a zero-copy way, think for example the C++ Boost library does. I tried Parallel::ForkManager, IPC::Shareable and IPC::ShareLite, but they all seem to copy. I could leave the data-structure on file, but then the children would be seeking and seeking, and I don't think the program would be any faster then. Don't know what to do, other than another language or a new logic. – Niels Larsen Oct 03 '12 at 14:40
  • @NielsLarsen The `forks` module simulates threads for old perls without threading. It actually has the same API as `threads`. (What perl version are you using?) `ForkManager` uses a file with serialized data structures for data sharing - not what you want. `Shareable` does the same in memory, so there is one serialized data structure globally and one deserialized data structure per process. `threads::shared` really should solve your issues. A part of the whole problem is that Perl doesn't operate on raw memory (like C does), but manages memory for you, with the benefit of garbage collection. – amon Oct 03 '12 at 14:57
  • I have Perl 5.14.2 on Linux Mint Debian, and I just compiled it with -Dusethreads. Thank you, I will try threads::shared then, I skipped it because I read it was very inefficient. But I will try then, and post the result. You're my last hope .. – Niels Larsen Oct 03 '12 at 15:01
  • @amon: I initialized $sharedVariable to "1234" x 50_000_000 and printed the length of it in worker_sub. This should use 200 mb + whatever Perl needs, say 300 mb total. It prints the length ok, but RAM usage is over 1 gb. I can't figure how to post the code, 4-space indentation doesn't work and CR submits. Sorry if i flooded your mailbox with attempts. – Niels Larsen Oct 03 '12 at 15:49
  • @amon As the most recent comment made clear, `threads::shared` does not help address the question. Shared variables are still copied to every thread, `threads::shared` just makes sure the shared variable is updated in each interpreter. – darch Oct 03 '12 at 16:00
  • @darch: Well, now that I tried all things I could find, what is then the conclusion .. that Perl cannot do shared memory, and that I should use another language? – Niels Larsen Oct 03 '12 at 16:14
  • @NielsLarsen Yes, every tool has its limitations, and today we went to the outer boundary. Unless a guru turns up and enlightens us, switching languages *is* the solution. – amon Oct 03 '12 at 16:16
  • @Niels Larsen: Have you looked into Coro? Would it work for your situation? It doesn't duplicate the memory usage for every worker thread as it has a shared address space. https://metacpan.org/module/Coro – Oesor Oct 03 '12 at 17:04
  • A comment to Coro: That module is about Coroutines, not Threads. The code is not executed in parallel (and *never on multiple processors*), but sequentially. However, each Coro has its own stack, and is executed independently. On the outside, there is only one process, only one thread. – amon Oct 03 '12 at 17:52
  • 1
    @dan1111 just to clarify, Perl threads *can* use multiple cores effectively; they're real OS-level threads. They just have a lot of *other* crippling problems that make them unpleasant to use. – hobbs Oct 03 '12 at 19:26
  • @amon: Yes, I could not make Coro use multiple cores. The author says so at the bottom of the main page, but I think it should be said at the top. – Niels Larsen Oct 03 '12 at 21:48
0

You may could use Cache::FastMmap to store shared data. I heard that someone used this for IPC not for cache, and this cache is shared between processes. Large parts of this is written in C. Do not forget to add 'raw_values=1' in initialization. It is possible to compress values in cache so if you have enough CPU and your data compressibl it will ave you a lot of mem.

It is quite fast, here is some benchmark: http://cpan.robm.fastmail.fm/cache_perf.html

Because Cache::FastMmap mmap's a shared file into your processes memory space, this can make each process look quite large, even though it's just mmap'd memory that's shared between all processes that use the cache, and may even be swapped out if the cache is getting low usage.

However, the OS will think your process is quite large, which might mean you hit some BSD::Resource or 'ulimits' you set previously that you thought were sane, but aren't anymore, so be aware.

Community
  • 1
  • 1
user1126070
  • 5,059
  • 1
  • 16
  • 15
  • Yes, I came across Cache::FastMmap too. But that is for storing many key /value pairs where the value cannot be larger than a memory page ... my data are a few long arrays that cannot be split up. For key/value store I would use Kyoto Cabinet which is also memory mapped and more efficient too. That leaves logic rethink or another language, unless some guru comes by and says otherwise. – Niels Larsen Oct 04 '12 at 10:18