9

My use case is to maintain an in-memory cache over the data stored in a persistent DB.

I use the data to populate a list/map of entries on the UI. At any given time, the data displayed on the UI should be as updated as it is possible (well this can be done by the refresh frequency of the cache).

Major difference between a regular cache implementation and this particular cache is that it needs a bulk refresh of all the elements at regular intervals and hence is pretty different from an LRU kind of cache.

I need to do this implementation in Java and it will be great if there are any existing frameworks which can be utilized to get this built around them.

I have explored the Google Guava cache library but it is more suited to a per entry refresh rather than a bulk refresh. There are no simple APIs which do a refresh on the whole cache.

Any help will be highly appreciated.

Also, if it is possible to incrementally do the refresh, it shall be great because the only limitation which arises while refreshing the whole cache is that if the cache is very big in size, then the memory heap should be atleast twice the size of the cache in order to load the new entries and replace the old map with the new one. If the cache is incremental or there is a chunked refresh (refresh in equal sizes) it will be great.

Abhishek Jain
  • 4,478
  • 8
  • 34
  • 51

3 Answers3

3

EHCache is a pretty full-featured java caching library. i would imagine they have something which would work for you.

In order to do an incremental reload of a cache (which would work on most any cache), just iterate through the currently loaded entries and force refresh them. (you could run this task on a background scheduler).

As an alternative to forcing the entire cache to reload, EHCache has the ability to specify a "time-to-live" for an entry, so entries will automatically be reloaded if they are too stale.

jtahlborn
  • 52,909
  • 5
  • 76
  • 118
  • 1
    @jtahlborn- The BulkLoader API (http://ehcache.org/documentation/apis/bulk-loading) of EhCache is helpful but it would have been great if it had provided a refreshTime or periodic interval time option where it would itself manage the scheduling for the cache refresh. Anyways, it can always be achieved through an external scheduler and invoking the bulk loading API periodically. Thanks for the answer. – Abhishek Jain Oct 18 '12 at 09:20
  • For EHCache have a look at: http://www.ehcache.org/documentation/3.3/thread-pools.html and http://terracotta.org/documentation/4.1/bigmemorymax/api/bulk-loading – Aliuk Mar 28 '17 at 15:56
  • But..isn't time-to-live simply removes element from the cache? It is NOT the same as what you've written here - "automatically reloading" – javagirl May 26 '17 at 13:26
  • @javagirl - my suggestion will get you incremental refresh. it will not, as you pointed out, get you incremental preload. if you just need refreshing, then you are fine. if you need preload functionality, then, yes, you will need to do something more. – jtahlborn May 26 '17 at 17:54
  • @jtahlborn Suppose I used the scheduler for loaded with many entries from the cache. How can I evict(for update/delete operation) it and if the cache expires till next year – dwayneJohn Jul 04 '19 at 10:50
  • @dwayneJohn i'm not sure i follow your question – jtahlborn Jul 05 '19 at 18:57
0

Just inherit this class, and implement loadDataFromDB and updateData as you want to get the incremential updates

import org.apache.log4j.Logger;
import java.util.List;
import java.util.concurrent.Semaphore;


public abstract class Updatable<T>
{
    protected volatile long lastRefreshed = 0;
    private final int REFRESH_FREQUENCY_MILLISECONDS = 300000; // 5 minutes
    private Thread updateThread;
    private final Semaphore updateInProgress = new Semaphore(1);

    protected static final Logger log = Logger.getLogger(Updatable.class);

    public void forceRefresh()
    {
        try
        {
            updateInProgress.acquire();
        }
        catch (InterruptedException e)
        {
            log.warn("forceRefresh Interrupted");
        }

        try
        {
            loadAllData();
        }
        catch (Exception e)
        {
            log.error("Exception while updating data from DB", e);
        }
        finally
            {
            updateInProgress.release();
        }

    }

    protected void checkRefresh()
    {
        if (lastRefreshed + REFRESH_FREQUENCY_MILLISECONDS <     System.currentTimeMillis())
            startUpdateThread();
    }

    private void startUpdateThread()
    {
        if (updateInProgress.tryAcquire())
        {
            updateThread = new Thread(new Runnable()
            {
                public void run()
                {
                    try
                    {
                        loadAllData();
                    }
                    catch (Exception e)
                    {
                        log.error("Exception while updating data from DB", e);
                    }
                    finally
                    {
                        updateInProgress.release();
                    }
                }
            });

            updateThread.start();
        }
    }

    /**
     * implement this function to load the data from DB
     *
     * @return
     */
    protected abstract List<T> loadFromDB();

    /**
     * Implement this function to hotswap the data in memory after it was loaded from DB
     *
     * @param data
     */
    protected abstract void updateData(List<T> data);

    private void loadAllData()
    {
        List<T> l = loadFromDB();
        updateData(l);
        lastRefreshed = System.currentTimeMillis();
    }

    public void invalidateCache()
    {
         lastRefreshed = 0;
    }

}
RA.
  • 1,405
  • 1
  • 11
  • 18
  • Thanks for the answer RA. When is the checkRefresh() function going to get called? If I understand it correctly, this would need a continuous running process to poll using checkRefresh at regular intervals. I was looking forward to a cleaner implementation where I could just plugin a new cache with a cache loader. – Abhishek Jain Oct 05 '12 at 13:28
  • CheckRefresh should be called in every get operation that you implement in your class. I.E : public Data get() { checkRefresh(); // return data; } – RA. Oct 05 '12 at 13:30
  • But that will affect the latency of data retrieval in cases when the update thread is fired where as if it had gone ahead like a cron and already prefetched the data, such a case wouldn't have arisen. – Abhishek Jain Oct 05 '12 at 13:36
  • In our case we decided that the data may be a little bit stale. You can assure that the data is never stale, if you fetch it from DB synchronously, but this way all the requests will have to wait until the data was fetched. This was unacceptable for us, since our system is very low latency, and we cant stop all the requests until this data is fetched. – RA. Oct 05 '12 at 13:44
  • I agree that we agreed on the data being stale. The point I was trying to make can be understood from the following example: Say, the cache was refreshed at a time greater than the refresh frequency and now, in the next retrieval the update thread would need to run and this data request will have to wait till new data is fetched from the DB. We could have avoided such a kind of situation if we had refreshed it at fixed intervals irrespective of the fact that whether data is requested for or not. In such a scenario, we will always have updated data whenever a request/get call is made. – Abhishek Jain Oct 05 '12 at 13:54
  • In the class above - you can see that the requests that fires that update thread doesn't waits for it to return. It just uses the stale data. As for updating in fixed periods - we decided that we don't need to fetch the data, if there are no requests to consume it. I agree that in the scenario when requests are rare it could get that the server only server stale data, but it's not our case - so it wasn't a risk to us. – RA. Oct 05 '12 at 14:08
0

One thing which has to be checked is that is periodic refreshing required? You could apply your refresh logic once you are fetching data from the cache, this would remove the need to any asynchronous refreshing and would remove the need of maintaining any old copies of the cache. This IMO is the easiest and best way to refresh cache data as it does not involve any additional overheads.

 T getData(){
      // check if the last access time + refresh interval >= currenttime if so then    refresh cache
     // return data
    }

This will ensure that the data is refreshed based on the refresh interval and it does not need any asynchronous refresh .

prashant
  • 1,382
  • 1
  • 13
  • 19