2

There are a few questions and answers already on PostgreSQL import (as well as the specific SQLite->PostgreSQL situation). This question is about a specific corner-case.

Background

I have an existing, in-production web-app written in python (pyramid) and using alembic for easy schema migration. Due to the database creaking with unexpectedly high write-load (probably due to the convoluted nature of my own code), I've decided to migrate to PostgreSQL.

Data migration

There are a few recommendations on data migration. The simplest one involved using

sqlite3 my.db .dump > sqlitedumpfile.sql

and then importing it with

psql -d newpostgresdb < sqlitedumpfile.sql

This required a bit of editing of sqlitedumpfile. In particular, removing some incompatible operations, changing values (sqlite represents booleans as 0/1) etc. It ended up being too complicated to do programmatically for my data, and too much work to handle manually (some tables had 20k rows or so).

A good tool for data migration which I eventually settled on was pgloader, which 'worked' immediately. However, as is typical for data migration of this sort, this exposed various data inconsistencies in my database which I had to solve at source before doing the migration (in particular, removing foreign keys to non-unique columns which seemed a good idea at the time for convenient joins and removing orphan rows which relied on rows in other tables which had been deleted). After these were solved, I could just do

pgloader my.db postgresql:///newpostgresdb

And get all my data appropriately.

The problem?

pgloader worked really well for data but not so well for the table structure itself. This resulted in three problems:-

  1. I had to create a new alembic revision with a ton of changes (mostly datatype related, but also some related to problem 2).

  2. Constraint/index names were unreliable (unique numeric names generated). There's actually an option to disable this, and this was a problem because I needed a reliable upgrade path which was replicable in production without me having to manually tweak the alembic code.

  3. Sequences/autoincrement just failed for most primary keys. This broke my webapp as I was not able to add new rows for some (not all) databases.

In contrast, re-creating a blank database using alembic to maintain the schema works well without changing any of my webapps code. However pgloader defaults to over-riding existing tables, so this would leave me nowhere as the data is what really needs migrating.

How do I get proper data migration using a schema I've already defined (and which works)?

Ng Oon-Ee
  • 1,193
  • 1
  • 10
  • 26
  • Perhaps you can use `pgloader my.db postgresql:///newpostgresdb?tablename=matching` option once your schema is already defined. This *might* need to be done on a table by table basis (I am not sure since I haven't used that option other than for importing CSV files) – joanolo Jul 15 '17 at 20:49
  • That sounds possible, let me test and check that out. However I think based on what I've seen of pgloader so far that it will still drop the table and recreate it. – Ng Oon-Ee Jul 15 '17 at 22:45
  • 1
    Having tested it it doesn't seem to work. The documentation seems to indicate that this is csv specific unfortunately – Ng Oon-Ee Jul 15 '17 at 22:52
  • `:-(` pity. That was my use-case. – joanolo Jul 15 '17 at 22:53
  • I think you can still use pgloader with a configuration file `WITH create no tables`. Check the *LOAD SQLite DATABASE* section of [pgloader(1) -- PostgreSQL data loader](http://pgloader.io/howto/pgloader.1.html). I've never used configuration files myself... so, this might as well fail, even if it looks promising. – joanolo Jul 15 '17 at 23:00

1 Answers1

2

What eventually worked was, in summary:-

  1. Create the appropriate database structure in postgresql://newpostgresdb (I just used alembic upgrade head for this)

  2. Use pgloader to move data over from sqlite to a different database in postgresql. As mentioned in the question, some data inconsistencies need to be solved before this step, but that's not relevant to this question itself.

    createdb tempdb

    pgloader my.db postgresql:///tempdb

  3. Dump the data in tempdb using pg_dump

pg_dump -a -d tempdb > dumped_postgres_database

  1. Edit the resulting dump to accomplish the following:-

    • SET session_replication_role = replica because some of my rows are circular in reference to other rows in the same table

    • Delete the alembic_version table, as we're restarting a new branch for alembic.

    • Regenerate any sequences, with the equivalent of SELECT pg_catalog.setval('"table_colname_seq"', (select max(colname) from table));

  2. Finally, psql can be used to load the data to your actual database

psql -d newpostgresdb < dumped_postgres_database

Ng Oon-Ee
  • 1,193
  • 1
  • 10
  • 26