I'm trying to create an sqlite database by importing a csv file with urls. The file has about 6 million strings. Here are the commands I've used
create table urltable (url text primary key);
.import csvfile urldatabase
After about 3 million urls the speed slows down a lot and my hard disk keeps spinning continuously. I've tried splitting the csv file into 1/4th chunks but I run into the same problem.
I read similar posts on stackoverflow and tried using BEGIN...COMMIT
blocks and PRAGMA synchronous=OFF
but none of them helped. The only way I was able to create the database was by removing the primary key constraint from url. But then, when I run a select command to find a particular url, it takes 2-3 seconds which won't work for my application.
With the primary key set on url, the select is instantaneous. Please advise me what am I doing wrong.
[Edit] Summary of suggestions that helped :
- Reduce the number of transactions
- Increase page size & cache size
- Add the index later
- Remove redundancy from url
Still, with a primary index, the database size is more than double the original csv file that I was trying to import. Any way to reduce that?