2

In a single transaction, is there any significant performance difference between doing batch insert with a single statement like:

-- BEGIN TRANSACTION

INSERT INTO films (code, title, did, date_prod, kind) VALUES
    ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
    ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

-- COMMIT

or multiple statements like below:

-- BEGIN TRANSACTION

INSERT INTO films (code, title, did, date_prod, kind) VALUES
    ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy');

INSERT INTO films (code, title, did, date_prod, kind) VALUES
    ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

-- COMMIT
O. Jones
  • 103,626
  • 17
  • 118
  • 172
Cemre Mengü
  • 18,062
  • 27
  • 111
  • 169
  • 2
    If you want to do bulk-loading of data in PostgreSQL with better performance, then use [COPY](https://www.postgresql.org/docs/current/sql-copy.html), not INSERT. Other tips for populating a database are here: https://www.postgresql.org/docs/current/populate.html – Bill Karwin Jun 11 '23 at 13:24
  • 2
    Why ask us? Try it and see. – jjanes Jun 11 '23 at 15:14

2 Answers2

2

Any difference between these two approaches is effectively eliminated if you do the multiple INSERT operations (your second approach) within a single BEGIN / COMMIT transaction.

Why? The lion's share of the cpu and I/O work for data manipulation happens upon commit. If you're in autocommit mode, each INSERT gets an implicit commit, and the overhead associated with it, unless you're in a transaction. But multiple data manipulation statements in a single transaction incur the overhead just once.

If you didn't set the the mode, you're probably in autocommit mode (unless you're using the python-language connector).

O. Jones
  • 103,626
  • 17
  • 118
  • 172
0

When it comes to performance, using a single insert statement with multiple value sets is generally faster than executing multiple insert statements individually within a transaction.

For example, let's say we have a table called "films" with columns "code," "title," "did," "date_prod," and "kind." We want to insert two rows into this table.

Using a single insert statement, you can do:

INSERT INTO films (code, title, did, date_prod, kind) VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

This single insert statement will add both rows to the "films" table in one operation.

Alternatively, using multiple insert statements, you would do:

    INSERT INTO films (code, title, did, date_prod, kind) VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy');

    INSERT INTO films (code, title, did, date_prod, kind) VALUES
('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

Here, you execute two separate insert statements, each adding one row to the "films" table. In terms of performance, the single insert statement is generally faster because it optimizes the operation and reduces overhead. It communicates and prepares the insert only once, resulting in improved performance compared to executing multiple statements individually within a transaction.