Why is SQLite faster than Redis in this simple benchmark?

Question

I have done simple performance test on my local machine, this is python script:

import redis
import sqlite3
import time

data = {}
N = 100000

for i in xrange(N):
    key = "key-"+str(i)
    value = "value-"+str(i)
    data[key] = value

r = redis.Redis("localhost", db=1)
s = sqlite3.connect("testDB")
cs = s.cursor()

try:
    cs.execute("CREATE TABLE testTable(key VARCHAR(256), value TEXT)")
except Exception as excp:
    print str(excp)
    cs.execute("DROP TABLE testTable")
    cs.execute("CREATE TABLE testTable(key VARCHAR(256), value TEXT)")

print "[---Testing SQLITE---]"
sts = time.time()
for key in data:
    cs.execute("INSERT INTO testTable VALUES(?,?)", (key, data[key]))
    #s.commit()
s.commit()
ste = time.time()
print "[Total time of sql: %s]"%str(ste-sts)

print "[---Testing REDIS---]"
rts = time.time()
r.flushdb()# for empty db
for key in data:
    r.set(key, data[key])
rte = time.time()
print "[Total time of redis: %s]"%str(rte-rts)

I expected redis to perform faster, but the result shows that it much more slower:

[---Testing SQLITE---]
[Total time of sql: 0.615846157074]
[---Testing REDIS---]
[Total time of redis: 10.9668009281]

So, the redis is memory based, what about sqlite? Why redis is so slow? When I need to use redis and when I need to use sqlite?

Why would SQLite be slow? ;-) Don't forget that SQLite is entirely "in process" (and non-contended) as well in this scenario. Also, why time the `flushdb`? — , Jun 26 '12 at 22:09
@pst flushdb is run only once, it is to be sure that I start from empty db, And also I have read this: Redis typically holds the whole dataset in RAM... So I expect to be it fast enough, even I am not sure how sqlite works, I expected no so much performance difference. — torayeff, Jun 26 '12 at 22:17
@torayeff you could speed up the sqlite portion even more with `CREATE TABLE IF NOT EXISTS` and you can take out the roundtrips and try/catch block ;) — swasheck, Jun 26 '12 at 22:22
"Small. *Fast.* Reliable. Choose any three." need we say more? — 0xC0000022L, Jun 26 '12 at 22:30
@torayeff but wait, there's more. you can create a sqlite database *in memory*. then reduce the roundtrips. with the create statement i gave you you're really off and flying. — swasheck, Jun 26 '12 at 22:32
@swasheck can you please give any link, because I am developing crawler which runs on localhost, it should perform url-seen, then if url is not seen it should update database. Now I use redis db, as key urls and as values information gathered from that url, how can I optimize it with sqlite? — torayeff, Jun 26 '12 at 22:37
Well, in-memory data doesn't persist. If you need to keep it for a long time (or it needs to survive a crash) then I'd recommend against such a configuration. I was just pointing out ways to further skew/tweak/test in your benchmarks. — swasheck, Jun 26 '12 at 22:40
This really interesting and surprising at the same time, since you are comparing an in-disk SQLite db to an in-memory Redis db. One would assume SQLite would have been handicapped from the outset. SQLite supports in-memory databases too; so I wonder what the benchmark will show for that particular comparisons. — PowerAktar, May 31 '22 at 06:49

score 53 · Accepted Answer · answered Jun 26 '12 at 22:14

53

from the redis documentation

Redis is a server: all commands involve network or IPC roundtrips. It is meaningless to compare it to embedded data stores such as SQLite, Berkeley DB, Tokyo/Kyoto Cabinet, etc ... because the cost of most operations is precisely dominated by network/protocol management.

Which does make sense though it's an acknowledgement of speed issues in certain cases. Redis might perform a lot better than sqlite under multiples of parallel access for instance.

The right tool for the right job, sometimes it'll be redis other times sqlite other times something totally different. If this speed test is a proper showing of what your app will realistically do then sqlite will serve you better and it's good that you did this benchmark.

answered Jun 26 '12 at 22:14

Harald Brinkhof

4,375
1
22
32

3

+1, although I certainly don't agree with the quote: I'm generally not interested *how* something works (ok I am, but not when benchmarking), but how fast it is for the job at hand - if one thing's noticeably slower because of some architectural decisions, that still doesn't make the comparison "meaningless" – Voo Jun 26 '12 at 22:29
17

I'm the original author of this quote, and I do not agree with your disagreement ;-) Benchmarking is comparing apples to apples, so you need to understand what an apple is to assess its performance. – Didier Spezia Jun 27 '12 at 09:23
Apples to Oranges, a fruit is still a fruit - you have a goal to achieve. they are comparable. Everything else of how that goal is achieved is semantics to the core objective. the question was in relation to performance and speed in one solution to another. if you look too close to the details, or apples to apples as you say, the difference is almost negligible with on par systems. – DIGI Byte Jan 12 '22 at 01:03

score 38 · Answer 2 · answered Aug 29 '13 at 09:16

38

The current answers provide insight as to why Redis loses this particular benchmark, i.e. network overhead generated by every command executed against the server, however no attempt has been made to refactor the benchmark code to accelerate Redis performance.

The problem with your code lies here:

for key in data:
    r.set(key, data[key])

You incur 100,000 round-trips to the Redis server, resulting in great I/O overhead.

This is totally unnecessary as Redis provides "batch" like functionality for certain commands, so for SET there is MSET, so you can refactor the above to:

r.mset(data)

From 100,000 server trips down to 1. You simply pass the Python dictionary as a single argument and Redis will atomically apply the update on the server.

This will make all the difference in your particular benchmark, you should see Redis perform at least on par with SQLite.

answered Aug 29 '13 at 09:16

user2014979

401
4
4

2

This is a true comparison. – noj Dec 11 '14 at 01:34
22

Good point. But if you replace the loop of `r.set` to one single bulk operation using `r.mset` then on the sqlite end you'll also need to replace the loop of multiple `INSERT` statements to one single bulk `INSERT`. IMO, only then it would be a true reliable benchmark that compares bulk-vs-bulk operations on the both ends. – kabirbaidhya Apr 27 '17 at 08:06
@kabirbaidhya I’ll second that. By using batch processing on Redis, you are creating an unfair benchmark. – PowerAktar May 31 '22 at 06:42

score 15 · Answer 3 · edited Jun 26 '12 at 22:42

15

SQLite is very fast, and you're only requiring one IO action (on the commit). Redis is doing significantly more IO since it's over the network. A more apples-to-apples comparison would involve a relational database accessed over a network (like MySQL or PostgreSQL).

You should also keep in mind that SQLite has been around for a long time and is very highly optimized. It's limited by ACID compliance, but you can actually turn that off (as some NoSQL solutions do), and get it even faster.

edited Jun 26 '12 at 22:42

answered Jun 26 '12 at 22:26

Brendan Long

53,280
21
146
188

1

That's fair, but the overhead should be minimal since it's connecting on localhost. At least less overhead than across a network. – swasheck Jun 26 '12 at 22:28
3

@swasheck Yes it's not nearly as bad as connecting to another machine, but it still involves system calls and more complicated communication (compared to just using your own processes's memory directly). – Brendan Long Jun 26 '12 at 22:31
How to be if I want to check url-seen in web crawler and at the same time update database? – torayeff Jun 26 '12 at 22:32
@torayeff *Concurrent* updates is actually where SQLites "Archilles heel" due to how the locking model works (it does not scale with *many* contending writers). Of course that is not stressed/tested at all the benchmark used and just a single "web crawler" is hardly adding much contention so... – Jun 26 '12 at 22:44
1

Disabling "ACID" (e.g. flush settings) doesn't speed up SQLite much for reasonable transaction sizes... it's only the commit that is "really really important to remember". (Although there are other issues at play to determine transaction visibility.) – Jun 26 '12 at 22:50
1

@pst those are both very good points which also serve to reinforce the need to truly know your project and select your tools appropriately. – swasheck Jun 26 '12 at 22:53
1

The "turn that off" link is broken – Justin Furuness May 16 '22 at 13:43

score 14 · Answer 4 · answered Jun 27 '13 at 18:27

14

Just noticed that you did not pipeline the commit for redis. Using piplines the time reduces:

[---Testing SQLITE---]

[Total time of sql: 0.669369935989]

[---Testing REDIS---]

[Total time of redis: 2.39369487762]

answered Jun 27 '13 at 18:27

basti

141
1
2

1

+1 for showing that sqlite is still faster after pipelining. BTW, you can make sqlite even faster as well by doing bulk inserts. – ChaimG Jan 21 '21 at 15:50

Why is SQLite faster than Redis in this simple benchmark?

4 Answers4

Linked