4

I'm trying to make score-based PHP memory cache, but I have a problem with the performance (how to count score and delete low score records).

Objectives

I have around 10 million records. And I want to cache only 0.1% of the most frequent records in memory, not on disc.

I would like to set 10,000 cache slots (0.1%) and I would like to keep only the most frequented in these slots.

Attempts / Problems

I tried File based cache and it's very slow.

I tried MySQL and PostgreSQL, but it has too much performance cost with counting score and deleting low score records.

I tried Time based cache, ex. xcache, but because of too much data in my project, it's too much writing. Also there is a problem with deleting lowest score records and with listing all cache slots, because it's "key->value".

I found Redis, but it seems that there isn't any score or something similar.

My question:

What cache method should I use for score based cache?

Note that all these posts are similar but do not contain any usable answers:

Fastest PHP memory cache/hashtable

In-memory cache with LRU expiration

In-memory cache architecture/technology?

Need a php caching recommendation

halfer
  • 19,824
  • 17
  • 99
  • 186
TomasDotCC
  • 93
  • 6
  • Are you asking how to design such a caching implementation? That's off-topic for being too broad. Or are you asking for recommendations of existing caching solutions? That's off-topic for being a recommendation of third-party software. – Barmar Jul 30 '15 at 16:38
  • An on-topic question would be one where you post the code you tried to write to implement your caching solution, and explained why it isn't working so we can help you fix it. – Barmar Jul 30 '15 at 16:39
  • Just to add few missing key pieces of information that you need to provide. How fast you want your cache to be updated? When? Maybe you will need a secondary cache level ( this depends on your upgrading top cache level strategy). Also why DB option did not work? For me table with index on the score and count rows should be enough efficient. You just need to add a insert listener to update the table when you insert new score and then select first 10000 records. Actually when I think it now you just need to update the count when insert new entry in DB which should be quite cheap and simple step. – ap0calypt1c Jul 30 '15 at 16:46

2 Answers2

1

It sounds like an LRU cache should give you what you need. You can configure Redis like an LRU cache. Probably it would handle your situation fairly well. Here is some reference from redis docs: http://redis.io/topics/lru-cache

To give a quick summary, You could use "allkeys-lru" eviction policy, and set the "maxmemory" to something you'd like. Once the memory limit is hit, redis would release the least recently used items and keep the memory usage under "maxmemory".

Another option is to use "memcached", it is an inmemory key-value store and by default is configured as an LRU cache.

If you want to really keep track of scores yourself, And already have some kind of scoring mechanism for your items, You could use Redis, you could keep a SortedSet together with a Hash to rank your cache items.

Hash would keep your cached data, and SortedSet would keep your items ranked.

You'd need these SortedSet commands:

  • You can add and change item scores with "ZADD"
  • You can also use "ZINCRBY" to change the item scores.
  • You can use "ZCARD" command to get the total count of the sorted set.
  • You can use ZRANGE to grab the lowest scoring items.
  • And you can use ZREM to remove the items.

After every insert, you'll have to manually check the SortedSet's count and limit the number of items on the cache. Overall the algorithm would be like this:

Cache Insert:

HSET "cacheKey" "itemName" "itemValue"
ZADD "rankingKey" "itemScore" "itemName"
count = ZCARD "rankingKey"
if (count > limit)
    lowestRankedItem = ZRANGE "rankingKey" 0 0
    ZREM "rankingKey" lowestRankedItem
    HDEL "cacheKey" lowestRankedItem

And lookup would be:

itemValue = HGET "cacheKey" "itemName"
ttekin
  • 827
  • 8
  • 9
0

Pure LRU is not really good, as it cache all new keys and I need to cache only 0.1% of data (high-scored), so there won't be much non-useful writing.

Sure I will implement Redis method as suggested in next version, as should be much faster. So use this only if you don't have Redis, if you can't decide score any other than just counting hits per some time or just if want to keep it simple.

But my best try yet is to use key=>value memcached, just testing now and seems fast and stable. Before I was thinking in wrong way...

When new item should be cached, we check if score exists in cache (cache for key contains int), if yes and smaller than our limit, we increase score, if yes and higher than limit, we get content and save to cache. If cache contains anything else than number, it's our content (high-scored). If cache for key doesn't exist at all, we save int 1 as score to cache with very short TTL and we just get content without any caching (low score).

<?php

$minhits = 10; // Min 10 hits for one key
$minhitstime = 60 // Max 60 seconds between hits
$cachettl = 3600; // Cache for 3600 seconds  

$key = "article1";

$mem = new Memcached();
$mem->addServer('localhost', 11211);

$content = $mem->get($key);

if(!$content OR (is_int($content) AND $content<$minhits)){
    $content = getArticleContent(); // Your own function
}

if(is_int($c)){
    if($c>=$minhits){
        $mem->set($key, $content, $cachettl);
    } else {
        $mem->set($key, ($c+1), $minhitstime);
    }
} elseif($c) {
    $content = $c;
} else {
    $mem->set($key, 1, $minhitstime);
}

echo $content;

?>

Also please don't try to cache int-only values ;) If yes, you have to edit code.

TomasDotCC
  • 93
  • 6