0

I have a tab separated CSV file in D:\DataSet\business_names_202007/businessDataSet.csv with 600k records (which may increase). I want to load entire data into below postgresql table.

PostgreSQL table:

PostgreSQL Structure

CSV file structure:

CSV File Structure

As you can see above structure, column count differs between CSV file and db ("transform_business" column extra in DB). while loading data we need to add this as well, the value for this column is given below.

"transform_business" column value in db is same as "BN_NAME" in the DataSet with following changes: Convert to UPPERCASE and all spaces should be removed between words

eg:

  • BN_NAME: Melbourne Collision Repair

  • transform_business: MELBOURNECOLLISIONREPAIR

  • Tool: Dbeaver

  • DB Schema: testDev

  • tablename: testdevtable

Tutorial link

Temporarily formed copt statement:

COPY testdevtable(register_name,bn_name,bn_status,transform_business) 
FROM 'D:\DataSet\business_names_202007/businessDataSet.csv' DELIMITER E’\t’ CSV HEADER;
Justin
  • 855
  • 2
  • 11
  • 30

2 Answers2

0

If it where me I would add a ON INSERT trigger to business table that transforms bn_name --> transform_business during the INSERT. Then leave transform_business out of the COPY. The second option is to do the transformation before you load it and add transform_business and the new data to the CSV file.

An example SQL code snippet that will do the transformation:

SELECT upper(replace('Melbourne Collision Repair Centre Mentone', ' ', ''));
                 upper                 
---------------------------------------
 MELBOURNECOLLISIONREPAIRCENTREMENTONE

Another option is to just COPY (register_name,bn_name,bn_status) into the database(with no trigger on table) and then run:

UPDATE business SET transform_business = upper(replace(bn_name, ' ', ''));

Not sure what happens after that, whether the transform_business value is going to be entered with new data or not. If the user/application is not going to enter it then I think you are back to a trigger on business table that runs upper(replace(bn_name, ' ', '')).

Adrian Klaver
  • 15,886
  • 2
  • 17
  • 28
  • Hi @Adrian, Thanks for ur rsponse. Is there any option in CLI to load Data with single Attempt(Loading other table column along with required changes of "transform_business"? – Justin Jul 09 '20 at 06:01
  • Hi @Adrian, how can we transform before load? bcoz data is too long (max upto 4000000) or pls elabore it by showing some sample piece of coding? – Justin Jul 09 '20 at 09:34
  • Well you are going to have to deal with that at some point, just a matter of where you want to pay the price. Updated answer with example and another suggestion. The only option that ```COPY``` has to manipulate data is to use ```PROGRAM``` which is a program that you write to feed data to ```COPY```. I have never done that so I cannot be of much help there. Your best bet is to get the basic data into the database and do the changes there. – Adrian Klaver Jul 09 '20 at 14:40
0

Perhaps you have omitted a valuable piece of information: what Postgres version are you running? If you have or can update to version 12 you can redefine your table so that the column transform_business is a generated column. Then for example:

create table table_name(
       id             bigint generated always as identity
     , register_name  text 
     , bn_name        text 
     , status         text 
     , transform_name text generated always as ( upper(replace(bn_name,' ',''))) stored
     ) ;

If you need to maintain the current data then you could:

alter table table_name  drop column transform_name; 
alter table table_name  add transform_name text generated always as ( upper(replace(bn_name,' ',''))) stored; 

This option would be rather slow, especially if table is large, but it is a one-time process. Either option would then give you the "constraint" that transform_name could not be updated directly, but would bu automatically updated when bn_name is updated.

Then your copy command would just load register_name, bn_name, and status.

Belayer
  • 13,578
  • 2
  • 11
  • 22
  • Hi @Belayer, I am using 11.5version of postgreSQL, raised concern to DBA to upgrade postgreSQL version to 12. will check and let u know the status. – Justin Jul 10 '20 at 06:32