3

Is there a way to insert foreign keys when using pandas to_sql function?

I am processing uploaded Consultations (n=40k) with pandas in django, before adding them to the database (postgres). I got this working row by row, but that takes 15 to 20 minutes. This is longer than I want my users to wait, so I am looking for a more efficient solution.

I tried pandas to_sql, but I cannot figure out how to add the two foreign key relations as columns to my consultations dataframe before calling the to_sql function. Is there a way to add the Patient and Praktijk foreign keys as a column in the consultations dataframe?

More specifically, when inserting row by row, I use objects of type Patient or Praktijk when creating new consultations in the database. In a dataframe however, I cannot use these types, and therefore don't know how I could add the foreign keys correctly. Is there possibly a value of type object or int (a patient's id?) which can substitute a value of type Patient, and thereby set the foreign key?

The Consultation model:

class Consultation(models.Model):
    # the foreign keys
    patient = models.ForeignKey(Patient, on_delete=models.CASCADE, null=True, blank=True)
    praktijk = models.ForeignKey(Praktijk, on_delete=models.CASCADE, default='')

    # other fields which do not give trouble with to_sql
    patient_nr = models.IntegerField(blank=True, null=True)
    # etc

The to_sql call:

consultations.to_sql(Consult._meta.db_table, engine, if_exists='append', index=False, chunksize=10000)

If above is not possible, any hints towards another more efficient solution?

Niels Hameleers
  • 1,201
  • 10
  • 11

2 Answers2

3

I had same problem and this is how I solved it. My answer isn't as straight forward but I trust it helps.

Inspect your django project to be sure of two things:

  1. Target table name
  2. Table column names

In My case, I use class Meta when defining django models to use explicit name (django has a way of automatically naming tables). I will use django tutorial project to illustrate.

class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published')
    class Meta:
        db_table = "poll_questions"

class Choice(models.Model):
    question = models.ForeignKey(Question, on_delete=models.CASCADE)
    choice_text = models.CharField(max_length=200)
    votes = models.IntegerField(default=0)
    class Meta:
        db_table = "question_choices"

Note: Django references Question foreign key in the database using pk of the Question object.

Assume I have a Question pk 1, and a dataframe df that I wish to update Question choices with. My df must look like one below if using pandas to batch insert into database!

import pandas as pd  

df = pd.DataFrame(
    {
        "question": [1, 1, 1, 1, 1],
        "choice_text": [
            "First Question",
            "Second Question",
            "Third Question",
            "Fourth Question",
            "Fifth Question"
        ],
        "votes":[5,3,10,1,13]
    }
)

I wish I could write the df as a table. Too bad that SO doesn't support usual markdown for tables

Nonetheless, we have our df next step is to create database connection for inserting the records.

from django.conf import settings
from sqlalchemy import create_engine

# load database settings from django

user = settings.DATABASES['default']['USER']
passwd = settings.DATABASES['default']['PASSWORD']
dbname = settings.DATABASES['default']['NAME']

# create database connection string
conn = 'postgresql://{user}:{passwd}@localhost:5432/{dbname}'.format(
    user=user,
    passwd=passwd,
    dbname=dbname
)

# actual database connection object.
conn = create_engine(conn, echo=False)

# write df into db
df.to_sql("question_choices", con=conn, if_exists="append", index=False, chunksize=500, method="multi")

Voila!
We are done!

Note:
django supports bulk-create which, however, isn't what you asked for.

Jason Muriki
  • 129
  • 1
  • 7
0

I ran into a similar problem using SQLalchemy but I found a simple workaround.

What I did is defined the database schema the way I wanted with SQLalchemy (with all the datatypes and foreign keys I needed) and then created an empty table, then I simply changed the if_exists parameter to append.

This will append all the data to an empty database.

Christos
  • 26
  • 3