3

I am using triple store database for one of my project (semantic search engine for healthcare) and it works pretty fine. I am considering on giving it a performance boost by using a layer of key value store above triple store. Triple store querying is slower since we do deep semantic processing.

This is how I am planning to improve performance:

1) Running Hadoop job for all query terms every day by querying triple store. 2) Caching these results in a key value store in a cluster. 3) When user searches for a query term, instead of searching triple store, key value store will be searched first. Triple store will be searched only when query term not found in key value store.

Key value pair which I plan to save is a "String" to "List of POJO mapping". I can save it as a BLOB.

I am confused on using which key value store. I am looking mainly for failover and load balancing support. All I need is a simple key value store which provides above features. I do not need to sort/search within values or any other functionalities.

Please correct me if I am wrong. I am assuming memcached and Redis will be faster since it is in memory. But I do not know if any Java clients of Redis(Jredis) or memchaced(Spymemcached) supports failover. I am not sure whether to go with in memory or persistent storage. I am also considering Voldemort, Cassandra and HBase. Overall key values will be around 2GB to 4GB size. Any pointers on this will be really helpful.

I am very new to nosql and key value stores. Please let me know if you need any more details.

CRS
  • 471
  • 9
  • 23
  • Besides of replication and failure handling, `Voldemort` allows you to create the storage with Hadoop (read-only store) so you may combine steps 1) and 2). The size of the values to look up is also a factor to consider, have a look at: https://groups.google.com/forum/?fromgroups=#!topic/project-voldemort/ZUHE06ksZ58 – Lorand Bendig Nov 23 '12 at 18:13

5 Answers5

1

Have you gone over memcached tutorial article (they explain load balancing aspects there, since memcached instances balance load based on your key hash, also spymemcached is discussed how it handles connectivity failures):

Use Memcached for Java enterprise performance, Part 1: Architecture and setup http://www.javaworld.com/javaworld/jw-04-2012/120418-memcached-for-java-enterprise-performance.html

Use Memcached for Java enterprise performance, Part 2: Database-driven web apps http://www.javaworld.com/javaworld/jw-05-2012/120515-memcached-for-java-enterprise-performance-2.html

For enterprise grade fail-over/cross data center replication support in memcached you should use Couchbase that offers these features. The product has evolved from memcached base.

user1697575
  • 2,830
  • 1
  • 24
  • 37
0

Before you build infrastructure to load your cache, you might just try adding memcached on top of your existing system. First, measure your current performance well. I suggest JMeter or similar tools. Here's the workflow in your application: Check memcached, if it's there, you're done. If not, run the query against the triple store and save the results in memcached. This will improve performance if you have queries that are repeated. Memcached will use the memory you give it efficiently, throwing away things that don't get used very often. Failover is handled by your application (if it's not in memcached, you use your existing infrastructure).

Joshua Martell
  • 7,074
  • 2
  • 30
  • 37
  • Thank you for the reply. Right, that is the plan as of now. But I was looking at a key value store which supports failover and replication. i.e if a key value store server is down then all the key values of that map should be divided among other servers (something like this). Failover is supported by my application but I was looking at failover at the caching level too. I know that Hbase supports replication and failover but I am looking for something simpler like memcached or redis. But I am not aware if they support replication and failover. I could not find much info in tutorial. – CRS Nov 21 '12 at 11:29
  • Use a memcached client that support consistent hashing (many do) which will redistribute keys if one of the servers is unreachable. – Joshua Martell Nov 22 '12 at 16:06
0

We use triple store and cache data in memcache provided by google app engine and it works fine. It reduced the overhead of sparql query over triple store.

tousif
  • 103
  • 3
  • 16
0

Only cassandra will have mentioned features and CQL full support, which helps in maintaining, otherwise maybe you should look in another direction:

Write heavy, replicated, bigger-than-memory key-value store

Community
  • 1
  • 1
42n4
  • 1,292
  • 22
  • 26
0

Since you want just to cache data in front of your triple store, going with disk-based, or replicated/distributed key-value stores seems to be pointless. All you need is essentially to cache data in front of your queries right on the machines where those queries are done. No "key-value stores", just vanilla Java caching solutions.

In 2016 the best cache for Java is Caffeine.

leventov
  • 14,760
  • 11
  • 69
  • 98