Out of Process in memory database table that supports queries for high speed caching

Question

I have a SQL table that is accessed continually but changes very rarely.

The Table is partitioned by UserID and each user has many records in the table.

I want to save database resources and move this table closer to the application in some kind of memory cache.

In process caching is too memory intensive so it needs to be external to the application.

Key Value stores like Redis are proving inefficient due to the overhead of serializing and deserializing the table to and from Redis.

I am looking for something that can store this table (or partitions of data) in memory, but let me query only the information I need without serializing and deserializing large blocks of data for each read.

Is there anything that would provide Out of Process in memory database table that supports queries for high speed caching?

Searching has shown that Apache Ignite might be a possible option, but I am looking for more informed suggestions.

What do you mean by "Out of Process in memory database table"? — Sameer, Sep 12 '17 at 07:07
What is your Front end? Some managed environment like Asp.net has output caching or middleware caching & also you can look into Elastic Search if it matches your needs. — Pranav Singh, Sep 12 '17 at 10:45

score 9 · Accepted Answer · answered Sep 01 '17 at 05:45

9

Since it's out-of-process, it has to do serialization and deserialization. The problem you concern is how to reduce the serialization/deserizliation work. If you use Redis' STRING type, you CANNOT reduce these work.

However, You can use HASH to solve the problem: mapping your SQL table to a HASH.

Suppose you have the following table: person: id(varchar), name(varchar), age(int), you can take person id as key, and take name and age as fields. When you want to search someone's name, you only need to get the name field (HGET person-id name), other fields won't be deserialzed.

answered Sep 01 '17 at 05:45

for_stack

21,012
4
35
48

1

for the data I am storing, each user would have many rows they need cached. Ideally I would want to query with a UserId and a TagId. Each User would have many different rows where the TagId's are different. If I could do SQL type queries from a memory table and say where UserId =y and TagId = x and only get back the one row containing the data, that would be ideal. That is how we are using SQL to store and retrieve the current data. SQL hekaton sounds like it might be a possibility, but costs are high. – SetiSeeker Sep 01 '17 at 06:49

score 4 · Answer 2 · answered Sep 01 '17 at 11:03

Ignite is indeed a possible solution for you since you may optimize serialization/deserialization overhead by using internal binary representation for accessing objects' fields. You may refer to this documentation page for more information: https://apacheignite.readme.io/docs/binary-marshaller

Also access overhead may be optimized by disabling copy-on-read option https://apacheignite.readme.io/docs/performance-tips#section-do-not-copy-value-on-read

Data collocation by user id is also possible with Ignite: https://apacheignite.readme.io/docs/affinity-collocation

score 2 · Answer 3 · answered Sep 06 '17 at 07:53

As the @for_stack said, Hash will be very suitable for your case.

you said that Each user has many rows in db indexed by the user_id and tag_id . So It is that (user_id, tag_id) uniquely specify one row. Every row is functional depends on this tuple, you could use the tuple as the HASH KEY.

For example, if you want save the row (user_id, tag_id, username, age) which values are ("123456", "FDSA", "gsz", 20) into redis, You could do this:

HMSET 123456:FDSA username "gsz" age 30

When you want to query the username with the user_id and tag_id, you could do like this:

HGET 123456:FDSA username

So Every Hash Key will be a combination of user_id and tag_id, if you want the key to be more human readable, you could add a prefix string such as "USERINFO". e.g. : USERINFO:123456:FDSA .

BUT If you want to query with only a user_id and get all rows with this user_id, this method above will be not enough.

And you could build the secondary indexes in redis for you HASH.

as the above said, we use the user_id:tag_id as the HASH key. Because it can unique points to one row. If we want to query all the rows about one user_id.

We could use sorted set to build a secondary indexing to index which Hashes store the info about this user_id.

We could add this in SortedSet:

ZADD user_index 0 123456:FDSA

As above, we set the member to the string of HASH key, and set the score to 0. And the rule is that we should set all score in this zset to 0 and then we could use the lexicographical order to do range query. refer zrangebylex.

E.g. We want to get the all rows about user_id 123456,

ZRANGEBYLEX user_index [123456 (123457

It will return all the HASH key whose prefix are 123456, and then we use this string as HASH key and hget or hmget to retrieve infomation what we want.

[ means inclusive, and ( means exclusive. and why we use 123457? it is obvious. So when we want to get all rows with a user_id, we shoud specify the upper bound to make the user_id string's leftmost char's ascii value plus 1.

More about lex index you could refer the article I mentioned above.

score 1 · Answer 4 · answered Sep 07 '17 at 07:44

1

You can try apache mnemonic started by intel. Link -http://incubator.apache.org/projects/mnemonic.html. It supports serdeless features

answered Sep 07 '17 at 07:44

Srini Sydney

564
8
17

score 1 · Answer 5 · answered Sep 10 '17 at 11:55

For a read-dominant workload MySQL MEMORY engine should work fine (writing DMLs lock whole table). This way you don't need to change you data retrieval logic.

Alternatively, if you're okay with changing data retrieval logic, then Redis is also an option. To add to what @GuangshengZuo has described, there's ReJSON Redis dynamically loadable module (for Redis 4+) which implements document-store on top of Redis. It can further relax requirements for marshalling big structures back and forth over the network.

score 1 · Answer 6 · answered Sep 11 '17 at 06:07

With just 6 principles (which I collected here), it is very easy for a SQL minded person to adapt herself to Redis approach. Briefly they are:

The most important thing is that, don't be afraid to generate lots of key-value pairs. So feel free to store each row of the table in a different key.

Use Redis' hash map data type

Form key name from primary key values of the table by a separator (such as ":")

Store the remaining fields as a hash

When you want to query a single row, directly form the key and retrieve its results

When you want to query a range, use wild char "*" towards your key. But please be aware, scanning keys interrupt other Redis processes. So use this method if you really have to.

The link just gives a simple table example and how to model it in Redis. Following those 6 principles you can continue to think like you do for normal tables. (Of course without some not-so-relevant concepts as CRUD, constraints, relations, etc.)

score 0 · Answer 7 · answered Sep 12 '17 at 11:37

0

using Memcache and REDIS combination on top of MYSQL comes to Mind.

answered Sep 12 '17 at 11:37

Sameer

757
1
14
35

Out of Process in memory database table that supports queries for high speed caching

7 Answers7