1

I am looking for a way to insert big CSV (millions of rows and 20-30 fields) into tarantool db in a fastest way possible preferably through one of the connectors (python or c#) and access the space later within the application built with python/c#.

I am new to tarantool, when I checked the documentation of tarantool I found the CSV built-in module https://www.tarantool.io/en/doc/2.5/reference/reference_lua/csv/ here, but this doesn't actually create a space in tarantool.

And then I tried to create the space and then do the insert through the python app by parsing the csv file and inserting each and every record into the space like the following

tester = connection.space('tester')
tester.insert((<data>))

But this is not good as performance wise, it was very slow. What is the best way to import the CSV input into tarantool with better performance?

Thank you.

user1390638
  • 180
  • 2
  • 8

1 Answers1

2

Tarantool csv built-in module needed only for csv reading. Yes, if you want to create space you should do it by hands.

I suggest you to try use batch insert:

-- You need to create stored lua procedure
-- And of course it's only a draft of such function
function batch_insert(list)
    box.begin()
        for _, record in ipairs(list) do
            space:replace(record)
        end
    box.commit()
end

And then it's possible to call this procedure from python client.

connection.call('batch_insert', several_csv_rows)

Read more about tarantool stored procedures in Tarantool documentation (e.g. https://www.tarantool.io/en/doc/1.10/getting_started/getting_started_connectors/#executing-stored-procedures)

Oleg Babin
  • 409
  • 3
  • 9