1

I have some Pickled data, which is stored on disk, and it is about 100 MB in size.

When my python program is executed, the picked data is loaded using the cPickle module, and all that works fine.

If I execute the python multiple times using python main.py for example, each python process will load the same data multiple times, which is the correct behaviour.

How can I make it so, all new python process share this data, so it is only loaded a single time into memory?

RivieraKid
  • 5,923
  • 4
  • 38
  • 47
Raunak
  • 6,427
  • 9
  • 40
  • 52

2 Answers2

2

If you're on Unix, one possibility is to load the data into memory, and then have the script use os.fork() to create a bunch of sub-processes. As long as the sub-processes don't attempt to modify the data, they would automatically share the parent's copy of it, without using any additional memory.

Unfortunately, this won't work on Windows.

P.S. I once asked about placing Python objects into shared memory, but that didn't produce any easy solutions.

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 1
    *"automatically share the parent process's data, without using any additional memory"* not 100% true. It will be copy-on-write, so it will copy and will use additional memory as soon as you're going to access this data for modification. – vartec May 11 '12 at 12:42
0

Depending on how seriously you need to solve this problem, you may want to look at memcached, if that is not overkill.

Bittrance
  • 2,202
  • 2
  • 20
  • 29