Imagine you have the following array of integers:
array(1, 2, 1, 0, 0, 1, 2, 4, 3, 2, [...] );
The integers go on up to one million entries; only instead of being hardcoded they've been pre-generated and stored in a JSON formatted file (of approximately 2MB in size). The order of these integers matters, I can't randomly generate it every time because it should be consistent and always have the same values at the same indexes.
If this file is read back in PHP afterwards (e.g. using file_get_contents
+ json_decode
) it takes from 700 to 900ms just to get the array back — "Okay" I thought, "it's probably reasonable since json_decode
has to parse about 2 million characters, let's cache it". APC caches it in a entry that takes about 68MB, probably normal, zvals are large. Retrieving however this array back from APC also takes some good 600ms which is in my eyes still way too much.
Edit: APC does serialize/unserialize to store and retrieve content which with a million item array is a lengthy and heavy process.
So the questions:
Should I expect this latency if I intend to load a one million entries array, no matter the data store or the method, in PHP? As far as I understand APC stores the zval itself, so theoretically retrieving it from APC should be as fast as it can possibly get (no parsing, no conversion, no disk access)Why is APC so slow for something so seemingly simple?Is there any efficient way to load a one million entries array entirely in memory using PHP? assuming RAM usage is not a problem.
If I were to access only slices of this array based on indexes (e.g. loading the chunk from index 15 to index 76) and never actually have the entire array in memory (yes, I understand this is the sane way of doing it, but I wanted to know all the sides), what would be the most efficient data store system for the complete array? Obviously not a RDBM; I'm thinking redis, but I would be happy to hear other ideas.