2

The following function imports around 60k records in 111 seconds. I've heard others say that copy_from and copy_expert are doing 1 million records in less than a minute. Is there something about using copy_expert that is slowing down process vs using copy_from? Anything I can do to optimize this?

cursor = connection.cursor()
cursor.copy_expert('''
COPY employee_employee (name, slug, title, base, overtime, other, gross, benefits, ual, total, year, status, jurisdiction_id, notes)
FROM STDIN WITH (FORMAT csv, HEADER true, FORCE_NOT_NULL (status));
''', open(csv_fname),
)

As for relevant variables the database connection is from Django (from django.db import connection). The database is on my local Macbook Pro and is PostgreSQL 10.

Paolo Melchiorre
  • 5,716
  • 1
  • 33
  • 52
Casey
  • 2,611
  • 6
  • 34
  • 60

2 Answers2

1

Specify buffering=2**10 in your file open.

I thought this was a problem with psycopg2 on OSX. I had the same problem, and it ran super fast on ubuntu. It was slow on OSX regardless of whether the database was on OSX and Ubuntu. My copy_expert was taking 13 minutes on OSX. When I changed buffering=2**10 it went from 13 minutes to 7 seconds.

  • Can you please explain more please. Is it in built in function `open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None)` ? buffering? I have this issue with pipelinewise app. It uses copy_expert command that is slow. – Slava Lenskyy Nov 05 '21 at 19:40
  • 1
    @SlavaLenskyy when you open the file don't set "buffering=-1", rather, set buffering much larger. – John Abraham Nov 16 '21 at 00:29
0

I suggest to use the great django-postgres-copy, which I used to import easily som CSV with ~1 million rows in few seconds directly to PostgreSQL.

It's made by the California Civic Data Coalition, to import and export delimited data with Django support for PostgreSQL's COPY command.

"This package tries to make using COPY as easy as any other database routine supported by Django. It is implemented by a custom model manager."

Here’s how it imports a CSV to a database table.

from myapp.models import MyModel

MyModel.objects.from_csv("./data.csv", dict(name='NAME', number='NUMBER'))

And here’s how it exports a database table to a CSV.

from myapp.models import MyModel

MyModel.objects.to_csv("./data.csv")

Read the django-postgres-copy documentation to have more information.

Paolo Melchiorre
  • 5,716
  • 1
  • 33
  • 52