0

I have a huge 200 GB sql file mostly with inserts statements. I basically create some few tables with ENGINE = innoDB. Regarding the insert statements, I have a table called Person, another one called Target, whose primary key is a foreign key referencing to Person. Then I have and two more tables, one called Source, which is pretty much the same as Target, and another one called Shoot connecting Source to Target.

So I insert about 8k targets and over 100 MI sources, so there you can have a picture of over 1,3 BI rows in Shoot table.

There is no error in my sql file, I've checked it before with smaller files. I just want to be able to load this file in my database at a speed faster than 1000 lines per minute.

Any tips are welcomed.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • Do you have any index? I suggest to index your data after the insertion – riccardo.tasso Mar 09 '15 at 15:10
  • I don't have indexes but primary and foreign keys. How exactly would I index my data after the insertion? And how would that help me, if I am taking an eternity to go through the infinite amount of lines? – Ricardo Silveira Mar 09 '15 at 15:13
  • 1
    Why is this tagged SQL Server **and** MySQL? "SQL Server" is a Microsoft product and does not refer to SQL servers in general. – tadman Mar 09 '15 at 15:13
  • First question is: Have you tuned your MySQL instance at all? The default configuration is usually terrible. There are many references and [books](http://shop.oreilly.com/product/0636920022343.do) that talk about how to do this. – tadman Mar 09 '15 at 15:15
  • @tadman I don't know how it got tagged as sql server, to be honest. Well, since I have not been using database in a while I am kinda out of touch, where and how can I tune MySQL instance? Can you recommend me any specific article to do it? I've been looking for MySQL documentation but it is terrible. I am even considering Maria. – Ricardo Silveira Mar 09 '15 at 15:17
  • Put the DB in single user mode, disable indexes & constraints, run your SQL (or BCP upload if that's an option - http://stackoverflow.com/questions/1626059/mysql-bulk-load-command-line-tool) then re-enable things and put back in multi-user. When running the upload do it from the DB server to minimise effects of network latency. – JohnLBevan Mar 09 '15 at 15:18
  • MariaDB is more actively maintained, it will be better in the long run, and PostgreSQL is even better still. If you need a database that's really fast, you need to try out several to see which is the best fit. There are more exotic databases like Cassandra, Redis and such that could be a good fit as well. Don't think MySQL is the only database out there. – tadman Mar 09 '15 at 15:19
  • @JohnLBevan I am at localhost, it just my own machine. I am using dabatase merely as a way to analyze data. If I disable indexes and constraints, won't I miss the opportunity to check if a given value wasn't inserted already ? – Ricardo Silveira Mar 09 '15 at 15:26
  • @tadman, I have that in mind, thanks for the recommendations, I will look forward these databases. I just wonder if they support the same SQL statement I have. – Ricardo Silveira Mar 09 '15 at 15:35
  • @RicardoSilveira; yes, on insert you won't be alerted; when you enable constraints & indexes it should error if there are issues though; and you can then run other scripts to fix... however that's probably worse than uploading with the constraints in place (not sure) - I'd assumed the data being uploaded was pre-validated. – JohnLBevan Mar 09 '15 at 15:41
  • @RicardoSilveira If it's just `INSERT` statements into a schema, any SQL standard database will. PostgreSQL is very standards compliant. – tadman Mar 09 '15 at 15:57

2 Answers2

1

200GB will take hours -- simply because of the Disk I/O. What type of mass storage do you have? SSDs are faster, but it will still take time. RAID-5 or -10 will be faster.

What does the incoming data look like?
* CSV and you are using LOAD DATA? That may be the fastest.
* mysqldump output with INSERTs that have lots of rows each? That comes in a close second.

What INDEXes and FOREIGN KEYs do you have? Disable both before loading; re-enable afterwards. But do not leave the PRIMARY KEYs.

Is the incoming data sorted in PRIMARY KEY order? That can help a lot (for InnoDB).

innodb_buffer_pool_size should be about 70% of available RAM.

Rick James
  • 135,179
  • 13
  • 127
  • 222
-2

The fastest way is to use SSIS. The second fastest way is to use BCP. Both are tools that come free with any paid version of SQL Server.

benjamin moskovits
  • 5,261
  • 1
  • 12
  • 22