Ruby: Cloud Spanner insert performance

Question

I came across the google-cloud-spanner gem for Ruby.

Using session.commit { |c| c.insert(table, row) } I can insert into Cloud Spanner easily.

However, I cannot exceed more than ~200 inserts per Second this way (from a computing instance in the same region). In order to increase performance, I would have to pass an Array of rows to the insert method: c.insert(table, [row, row, row,...]).

Why is Cloud Spanner working this way? Could this be due to networking overheads?

Inserting multiple records together is not always practical on my application layer.

EDIT:

Full example that shows creation of spanner client, etc:

spanner = Google::Cloud::Spanner.new(project: ..., keyfile: ...)
session = spanner.client(instance, database)

# Insert:
session.commit { |c| c.insert(table, row) }

Can you update your question with more code that shows what you are doing to get only ~200 inserts a minute? What type of object is session? We typically see much higher performance. — blowmage, Sep 01 '17 at 23:24
Are you creating a new spanner client object for each insert? Or are you using the same spanner client object for all 200 inserts a minute? — blowmage, Sep 01 '17 at 23:45
Hi, first of all ~200 inserts per Minute was a typo, what I mean is ~200 inserts per Second. Still, I think it's a bit slow.. When I pass an Array of rows to the insert method, I can achieve thousands of records per second. I have updated my post showing how I create the spanner and connect to the database. I'm not creating a new spanner client for each insert, but I do a `session.commit { ... }` for each insert — J.Doe, Sep 02 '17 at 09:05
200 transactions a second is definitely better than 200 a minute! :) Have you tried adjusting the max session pool size to see if that helps increase throughput? https://googlecloudplatform.github.io/google-cloud-ruby/#/docs/google-cloud-spanner/latest/google/cloud/spanner/project?method=client-instance Are you using threads at all? — blowmage, Sep 02 '17 at 13:50

mpp · Answer 1 · 2020-12-03T14:28:32.717

The performance issue you experiencing is not caused by ruby itself its just hows spanner works,

you commiting each row per commit, which will take ages try to do that as a package of each 500rows, it will speed up signifficantly

and be aware of mutations, in general your program needs to calculate it otherwise library will throw an exception that you hit upper limit

I recently tested loading 100mb/(110k rows) csv file into spanner as you did, and it takes 1h30min

when I rewrited my code to commit every 500rows its ended loading in 3minutes.

(do a reading about mutations, in my case 500rows was ok, in yours can be different )

that formula seems to be legit: https://github.com/googleapis/google-cloud-go/issues/1721#issuecomment-572273387

Ruby: Cloud Spanner insert performance

1 Answers1