0

The PostgreSQL documentation states that the COPY command can only be done on Tables, but I need to import a massive CSV file into a view.

The reason it needs to be a view, is because the CSV I've been given has a large number of [unnormalized] columns, and I would like to use a view (with INSERT Trigger) to parse the columns in, and assimilate them in to my [normalized] database tables.

This CSV File has between 20k-30k rows, so normalising the CSV is just not feasible.

This task of importing a CSV from is most likely going to occur more than a few times and so, using a view will make it easier to for me to come back in a few months and identify my import_stuff table, rather than trying to have to try and remember what function to use.

Any ideas? as I just cannot waste 1-2 weeks trying to process this CSV each time (the CSV has many inconsistencies, like long trailing spaces on descriptions, wrong encoding, newlines in single line names, etc; that can take 3 days alone to clean, before normalizing the data)

Guy Park
  • 959
  • 12
  • 25
  • 1
    What exactly is your question? How to make a view writeable? Typically the approach for something like this, is to import the data into some intermediate ("staging") table (not a view). Then use SQL and joins to copy the data with the correct foreign keys into the real table. –  Aug 12 '19 at 07:30
  • how to copy into a View would be the question. I'm trying to avoid the "No you can't, why would you use a view" sort of responses, but I like your suggestion of a staging table... was hoping there was a way without getting complex/messy. Suppose I could put the trigger on a Table instead.....hmmm. Can you please post that as an answer. :) – Guy Park Aug 12 '19 at 07:34
  • 1
    I think the solution using a staging table is far **less** complex than using a view. The view will require an [`instead of` trigger](https://www.postgresql.org/docs/current/plpgsql-trigger.html#PLPGSQL-VIEW-TRIGGER-AUDIT-EXAMPLE) to "normalize" the incoming data - essentially the same logic you would do with a staging table. A staging table is probably more flexible and easier to debug. Plus: it is probably faster as well. The trigger would have to do a row-by-row lookup of the data, while with the staging table this could be done using JOINs which is more efficient –  Aug 12 '19 at 07:42

0 Answers0