0

I have an script in python that basically have this workflow

  1. Receive a csv with header.
  2. Create table on database on PostgreSQL where the fields are the headers of csv.
  3. Copy the data of csv to table created in step2

here a fragment of code of step 3

file_object = open(file_csv)
cur = connection.cursor()
copy_sql = """
   COPY %sFROM stdin WITH CSV HEADER
   DELIMITER as '""" + delimiter +"'"

cur.copy_expert(sql=copy_sql % table,file = file_object)
connection.commit()
cur.close()

This script works ok, but some csv inputs have the last columns without header and the code above fails

File "copy_to_psql.py", line 18, in load_csv_psql

cur.copy_expert(sql=copy_sql % table,file = file_object)

psycopg2.DataError: extra data after last expected column

Is there any way to only select the the columns with headers from csv?

Is there any solution using only PostgreSQL?

Any other suggestion?

Thanks in advance

Cyberguille
  • 1,552
  • 3
  • 28
  • 58
  • If Java is installed in your system try using DBIS. I don't think it can replace table creation process. But you can configure it to place data from csv to database by matching hear name to column name. You can extend some of Java feature and write customized code. https://dbisweb.wordpress.com/ – compyutech Apr 26 '18 at 18:03
  • @compyutech thanks for your suggestions, I think that i possible solve that issue with java, but I would like to solve that only with python and PostgreSQL. I think that I can create a new csv only with the columns with header, but I'm looking for a solution more straight forward. – Cyberguille Apr 26 '18 at 19:09
  • 1
    Clean your CSV is the best option. – Abhishake Gupta Apr 27 '18 at 02:41
  • @Abhishakegupta I've a solution cleaning the csv, but I tried to see if there are otheroptio, but it look like as you say that's the best option. – Cyberguille Apr 27 '18 at 14:06
  • @Cyberguille I suggested Java solution because there you only need to configure no coding so it can replace python requirement. If you want to achieve by PostgreSQL, How about Import CSV but command line ? Yes, You should have required column name, Dynamic wont be possible. – compyutech May 01 '18 at 08:35

1 Answers1

2

As mention @ABAbhi my best option was clean the csv.

So to the workflow of my algorithm, I add a step to remove columns without

  1. Receive a csv with the header.
  2. Create a table on the database on PostgreSQL where the fields are the headers of csv.
  3. Remove columns without a header.
  4. Copy the data csv to table created in step2

And here the code of step 3:

def remove_empty_colums(input_csv="in.csv", output_csv="out.csv", delimiter=','):
    reader = csv.DictReader(open(input_csv), delimiter=delimiter)
    headers = reader.fieldnames
    writer = csv.DictWriter(open(output_csv, 'wb'),
                            fieldnames=headers, delimiter=delimiter)
    writer.writeheader()
    for row in reader:
        row_dict = {}
        for header in headers[:-1]:
            row_dict[header] = row[header]
        writer.writerow(row_dict)
Cyberguille
  • 1,552
  • 3
  • 28
  • 58