0

I have a database which contains all the data from a CSV that I downloaded from a botanic website, and I want to it always updated based on the version of this CSV. My problem is that the CSV is extremely huge, It contains at least 1 million records which takes on average an entire hour to save everything. Any ideas on how can I perform this update without the need to rewrite the entire thing? TYSM!

Some extra info:

  • It's a nodeJs project, and I'm using Prisma for the ORM
  • It's actually a txt file which I parse as a CSV since all the data is separated by tabs
  • I'm using posgresql
  • Currently I'm dropping the table and saving everything from the ground up
O. Jones
  • 103,626
  • 17
  • 118
  • 172
lambadeltaz
  • 57
  • 1
  • 4

2 Answers2

0

You don't mention how you load the data into postgres, but you could use the COPY command, something like this:

COPY mytable FROM '/tmp/myfile.csv' (FORMAT csv, header TRUE);

Should be faster than using the ORM

https://www.postgresql.org/docs/current/sql-copy.html

Other option is to use file_fdw, if you just need to query the data, no updates

https://www.postgresql.org/docs/current/file-fdw.html

Sven Knowles
  • 1
  • 1
  • 3
  • Sorry for not informing it! I'm using csv parser, iterating over every line on the csv and saving it to the database using prisma.create. Thanks for the reply, I will check this out – lambadeltaz Apr 28 '22 at 00:27
0

You would need to essentially find differences between two csv versions and find the records which have changed and just update those records.

You could use any text/csv comparator npm library to find the difference, and once you get the records which has updates you could just initiate an update query for them.

Nurul Sundarani
  • 5,550
  • 3
  • 23
  • 52