66

In my PHP script I need to create an array of >600k integers. Unfortunately my webservers memory_limit is set to 32M so when initializing the array the script aborts with message

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 71 bytes) in /home/www/myaccount/html/mem_test.php on line 8

I am aware of the fact, that PHP does not store the array values as plain integers, but rather as zvalues which are much bigger than the plain integer value (8 bytes on my 64-bit system). I wrote a small script to estimate how much memory each array entry uses and it turns out, that it's pretty exactly 128 bytes. 128!!! I'd need >73M just to store the array. Unfortunately the webserver is not under my control so I cannot increase the memory_limit.

My question is, is there any possibility in PHP to create an array-like structure that uses less memory. I don't need this structure to be associative (plain index-access is sufficient). It also does not need to have dynamic resizing - I know exactly how big the array will be. Also, all elements would be of the same type. Just like a good old C-array.


Edit: So deceze's solution works out-of-the-box with 32-bit integers. But even if you're on a 64-bit system, pack() does not seem to support 64-bit integers. In order to use 64-bit integers in my array I applied some bit-manipulation. Perhaps the below snippets will be of help for someone:

function push_back(&$storage, $value)
{
    // split the 64-bit value into two 32-bit chunks, then pass these to pack().
    $storage .= pack('ll', ($value>>32), $value);
}

function get(&$storage, $idx)
{
    // read two 32-bit chunks from $storage and glue them back together.
    return (current(unpack('l', substr($storage, $idx * 8, 4)))<<32 |
            current(unpack('l', substr($storage, $idx * 8+4, 4))));
}
Community
  • 1
  • 1

8 Answers8

60

The most memory efficient you'll get is probably by storing everything in a string, packed in binary, and use manual indexing to it.

$storage = '';

$storage .= pack('l', 42);

// ...

// get 10th entry
$int = current(unpack('l', substr($storage, 9 * 4, 4)));

This can be feasible if the "array" initialisation can be done in one fell swoop and you're just reading from the structure. If you need a lot of appending to the string, this becomes extremely inefficient. Even this can be done using a resource handle though:

$storage = fopen('php://memory', 'r+');
fwrite($storage, pack('l', 42));
...

This is very efficient. You can then read this buffer back into a variable and use it as string, or you can continue to work with the resource and fseek.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • 1
    Going to get messy on 600k entries is it not? – Dave Jan 24 '14 at 13:16
  • 1
    Why would it? Write a little wrapper around it and accessing entries can be extremely straight forward. Only initialisation may get messy. – deceze Jan 24 '14 at 13:17
  • Thanks a lot! This works smoothly (and still extremely fast) for 32-bit integers. However, I need to store 64-bit integers. Any idea how I could achieve this? Cause `pack`-format `'I'` didn't work out on my 64-bit system. This still yields 4-byte binary string per integer... – Alexander Tobias Bockstaller Jan 24 '14 at 14:05
  • @Alex Hmm, good question, not sure. `d` *should* do it, but doesn't seem to in practice. May be worth opening a new question for. :) – deceze Jan 24 '14 at 14:13
  • 1
    Yeah, well I think I'll just do some bit-shifting and store the 8-byte-int as 2 4-byte-ints... If it works I'll edit my question later on and share the code. fwiw. – Alexander Tobias Bockstaller Jan 24 '14 at 14:15
  • 2
    From the PHP pack() manual "In systems where the integer type has 64-bit size, the float most likely does not have a mantissa large enough to hold the value without loss of precision. If those systems also have a native 64-bit C int type (most UNIX-like systems don't), the only way to use the I pack format in the upper range is to create integer negative values with the same byte representation as the desired unsigned value." so it seems even if php is a 64bit compile it still only supports 32bit ints properly for pack – Dave Jan 24 '14 at 14:20
  • +1 using a structured file is the real solution to memory, but I would consider a data-structure like a hash table that maps to several files to optimize the access speed – Khaled.K Jan 28 '14 at 20:22
31

A PHP Judy Array will use significantly less memory than a standard PHP array, and an SplFixedArray.

I quote "An array with 1 million entries using regular PHP array data structure takes 200MB. SplFixedArray uses around 90 megabytes. Judy uses 8 megs. Tradeoff is in performance, Judy takes about double the time of regular php array implementation."

Ryan
  • 3,552
  • 1
  • 22
  • 39
11

You can try to use a SplFixedArray, it's faster and take less memory (the doc comment say ~30% less). Test here and here.

Dysosmus
  • 832
  • 9
  • 15
11

You could use an object if possible. These often use less memory than array's. Also SplFixedArray is an good option.

But it really depends on the implementation that you need to do. If you need an function to return an array and are using PHP 5.5. You could use the generator yield to stream the array back.

RJD22
  • 10,230
  • 3
  • 28
  • 35
5

Use a string - that's what I'd do. Store it in a string on fixed offsets (16 or 20 digits should do it I guess?) and use substr to get the one needed. Blazing fast write / read, super easy, and 600.000 integers will only take ~12M to store.

base_convert() - if you need something more compact but with minimum effort, convert your integers to base-36 instead of base-10; in this case, a 14-digit number would be stored in 9 alphanumeric characters. You'll need to make 2 pieces of 64-bit ints, but I'm sure that's not a problem. (I'd split them to 9-digit chunks where conversion gives you a 6-char version.)

pack()/unpack() - binary packing is the same thing with a bit more efficiency. Use it if nothing else works; split your numbers to make them fit to two 32-bit pieces.

dkellner
  • 8,726
  • 2
  • 49
  • 47
4

600K is a lot of elements. If you are open to alternative methods, I personally would use a database for that. Then use standard sql/nosql select syntax to pull things out. Perhaps memcache or redis if you have an easy host for that, such as garantiadata.com. Maybe APC.

Gavin
  • 752
  • 8
  • 20
  • +1 Database is a good solution, in Mac & iOS development they use a local database to maintain application data – Khaled.K Jan 28 '14 at 20:27
2

I took the answer by @deceze and wrapped it in a class that can handle 32-bit integers. It is append-only, but you can still use it as a simple, memory-optimized PHP Array, Queue, or Heap. AppendItem and ItemAt are both O(1), and it has no memory overhead. I added currentPosition/currentSize to avoid unnecessary fseek function calls. If you need to cap memory usage and switch to a temporary file automatically, use php://temp instead.

class MemoryOptimizedArray
{
    private $_storage;
    private $_currentPosition;
    private $_currentSize;
    const BYTES_PER_ENTRY = 4;
    function __construct()
    {
        $this->_storage = fopen('php://memory', 'rw+');
        $this->_currentPosition = 0;
        $this->_currentSize = 0;
    }
    function __destruct()
    {
        fclose($this->_storage);
    }
    function AppendItem($value)
    {
        if($this->_currentPosition != $this->_currentSize)
        {
            fseek($this->_storage, SEEK_END);
        }
        fwrite($this->_storage, pack('l', $value));
        $this->_currentSize += self::BYTES_PER_ENTRY;
        $this->_currentPosition = $this->_currentSize;
    }
    function ItemAt($index)
    {
        $itemPosition = $index * self::BYTES_PER_ENTRY;
        if($this->_currentPosition != $itemPosition)
        {
            fseek($this->_storage, $itemPosition);
        }
        $binaryData = fread($this->_storage, self::BYTES_PER_ENTRY);
        $this->_currentPosition = $itemPosition + self::BYTES_PER_ENTRY;
        $unpackedElements = unpack('l', $binaryData);
        return $unpackedElements[1];
    }
}

$arr = new MemoryOptimizedArray();
for($i = 0; $i < 3; $i++)
{
    $v = rand(-2000000000,2000000000);
    $arr->AddToEnd($v);
    print("added $v\n");
}
for($i = 0; $i < 3; $i++)
{
    print($arr->ItemAt($i)."\n");
}
for($i = 2; $i >=0; $i--)
{
    print($arr->ItemAt($i)."\n");
}
humbads
  • 3,252
  • 1
  • 27
  • 22
1

Depending on how you are generate the integers, you could potentially use PHP's generators, assuming you are traversing the array and doing something with individual values.

Oscar M.
  • 1,076
  • 7
  • 9