0

Doing some simple work with rethinkdb, but getting really troublingly slow results. Have a process by which I shove ~23k objects into a rethink table. Strangely enough, that's the part that's fast. However the snippet below is bizarrely slow:

# Definitions
import rethinkdb as r
conn = r.connect(host=RETHINKDB_HOST, port=RETHINKDB_PORT)

# Actual Code
rdbt = r.db('datasets').table(table_name)
rdbt.update({
    "labels_completed": 0,
    "labels": [],
    "labeler_ids": [],
}).run(conn)

Seems very, very simple to me, but for some reason this query reliably takes ~10 seconds to run, and this isn't a big table. Previously did the update in three stages and it took 30s.

Why on earth is this update query so slow? Am I running into some secret performance issue in rethink?

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
  • Do you have a matching CPU load on your server? Or could it be a network latency? – Klaus D. Jun 27 '16 at 14:49
  • @KlausD. nope, no jump in CPU load. I think network latency is very unlikely. I'm making several other calls to rethink, making larger updates, etc... and none are this slow. It's also very consistent timing, and a local database, both of which imply it's probably not network latency. – Slater Victoroff Jun 27 '16 at 14:52
  • It's not the CPU and it's not disc I/O (would result in io wait of the CPU). There are not many other options. It should be worst to dump your network traffic and look for any connection attempts that time out or run for 10s. – Klaus D. Jun 27 '16 at 14:57
  • @KlausD. it's definitely not network-related. Again, this is totally local, and far from the only call being made to rethink. My gut says this is some undocumented performance bottleneck that is specific to rethink, but it really doesn't give me any tools to introspect what they might be doing on the backend. – Slater Victoroff Jun 27 '16 at 15:01
  • @KlausD. I found a jump in CPU Load! It was just not very noticeable under the rest of what I was running. Jumped up to ~20% utilization for some unclear reason. – Slater Victoroff Jun 27 '16 at 15:16
  • Which version of rethinkDB? – Klaus D. Jun 27 '16 at 15:17
  • @KlausD. 2.3.4 on MacOSX – Slater Victoroff Jun 27 '16 at 15:30
  • @SlaterTyranus I think it's because assignment of empty arrays. See: https://en.wikipedia.org/wiki/Block_allocation_map – mertyildiran Jun 27 '16 at 16:36
  • What sort of disk are you using? (Also, how much does it speed up if you specify `{durability: 'soft'}`?) – mlucy Jun 27 '16 at 20:10
  • @mlucy on an ssd, will check with soft durability, but ran a test just directly modifying the json files and it ran 10x faster. – Slater Victoroff Jun 27 '16 at 21:17
  • @SlaterVictoroff can you get an answer how to get performance, while updating 10000 rows one by one? I have faced same issue right now if you find anything helpful then please let me know. – Dipak Feb 13 '18 at 06:15
  • @Dipakchavda unfortunately never solved this one, and rethink is dead now :( – Slater Victoroff Feb 13 '18 at 20:47

0 Answers0