0

PostGres SQL will not accept data which is in violation of primary key. To ignore the duplicate data, I have this code:

import pandas as pd
import psycopg2
import os
import matplotlib
from sqlalchemy import create_engine
from tqdm import tqdm_notebook
from pandas_datareader import data as web
import datetime
from dateutil.relativedelta import relativedelta


db_database =  os.environ.get('123')
engine = create_engine('postgresql://postgres:{}@localhost:5433/stockdata'.format(123))
def import_data(Symbol):

        df = web.DataReader(Symbol, 'yahoo',start=datetime.datetime.now()-relativedelta(days=3), end= datetime.datetime.now())
        insert_init = """INSERT INTO stockprices
                        (Symbol, Date, Volume, Open, Close, High, Low)
                        VALUES
                    """
        
        
        vals = ",".join(["""('{}','{}','{}','{}','{}','{}','{}')""".format(
            Symbol,
            Date,
            row.High,
            row.Low,
            row.Open,
            row.Close,
            row.Volume,
            ) for Date, row in df.iterrows()])
        
        
        insert_end ="""ON CONFLICT (Symbol, Date) DO UPDATE
                    SET 
                    Volume = EXCLUDED.Volume,
                    Open = EXCLUDED.Open,
                    Close = EXCLUDED.Close,
                    Low = EXCLUDED.Low,
                    High = EXCLUDED.High

                    """
        query = insert_init + vals + insert_end
        engine.execute(query)
                    
import_data('aapl')

I am getting this error:

ProgrammingError: (psycopg2.errors.UndefinedColumn) column "symbol" of relation "stockprices" does not exist
LINE 2:                         (Symbol,Date, Volume, Open, Close, H...
                                 ^

[SQL: INSERT INTO stockprices

Could you please advise as to what does this error mean? I got rid of all the double quotes as advised in the comment.

enter image description here


I had used this code to create the table:

def create_price_table(symbol):

    print(symbol)
    df = web.DataReader(symbol, 'yahoo', start=datetime.datetime.now()-relativedelta(days=7), end= datetime.datetime.now())
    df['Symbol'] = symbol
    df.to_sql(name = "stockprices", con = engine, if_exists='append', index = True)
    return 'daily prices table created'


create_price_table('amzn')

Also as was mentioned in the comment. I used this to check the table name:

SELECT table_name
  FROM information_schema.tables
 WHERE table_schema='public'
   AND table_type='BASE TABLE';

enter image description here


Edit 1:

I changed the code as suggested in the comment, now the column name is in small case. Below is the code: import pandas as pd import psycopg2 import os import matplotlib from sqlalchemy import create_engine from tqdm import tqdm_notebook from pandas_datareader import data as web import datetime from dateutil.relativedelta import relativedelta

db_database =  os.environ.get('123')
engine = create_engine('postgresql://postgres:{}@localhost:5433/stockdata'.format(123))
def create_price_table(symbol):

    print(symbol)
    df = web.DataReader(symbol, 'yahoo', start=datetime.datetime.now()-relativedelta(days=7), end= datetime.datetime.now())
    df['symbol'] = symbol
    df = df.rename(columns= {'Open':'open'})
    df = df.rename(columns= {'Close':'close'})
    df = df.rename(columns= {'High':'high'})
    df = df.rename(columns= {'Low':'low'})
    df = df.rename(columns= {'Volume':'volume'})
    df = df.rename(columns= {'Adj Close':'adj_close'})
    df.index.name ='date'
    df.to_sql(name = "stockprices", con = engine, if_exists='append', index = True)
    return 'daily prices table created'

# create_price_table('amzn')

def import_data(Symbol):
        df = web.DataReader(Symbol, 'yahoo', start=datetime.datetime.now()-relativedelta(days=3), end= datetime.datetime.now())
        insert_init = """INSERT INTO stockprices
                        (symbol, date, volume, open, close, high, low)
                        VALUES
                    """
        
        
        vals = ",".join(["""('{}','{}','{}','{}','{}','{}','{}')""".format(
            Symbol,
            Date,
            row.High,   
            row.Low,
            row.Open,
            row.Close,
            row.Volume,
            ) for Date, row in df.iterrows()])
        
        
        insert_end ="""ON CONFLICT (Symbol, Date) DO UPDATE
                    SET 
                    Volume = EXCLUDED.Volume,
                    Open = EXCLUDED.Open,
                    Close = EXCLUDED.Close,
                    Low = EXCLUDED.Low,
                    High = EXCLUDED.High

                    """
        query = insert_init + vals + insert_end
        engine.execute(query)
                    
import_data('aapl')

This code however is producing a new error:

DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type bigint: "166.14999389648438"
LINE 4:                     ('aapl','2022-02-23 00:00:00','166.14999...
                                                          ^
Slartibartfast
  • 1,058
  • 4
  • 26
  • 60
  • 2
    Because you created the table using the dreaded double quotes, you now have to use them **everywhere**, e.g. `EXCLUDED."Volume"` and `EXCUDED` should be `EXCLUDED` –  Feb 25 '22 at 11:39
  • I thought you have to use double quotes in Postgres! There was some copypaste error on `EXCLUDED` Still not working though – Slartibartfast Feb 25 '22 at 11:40
  • 1
    Well, _if_ you start using them, you have to use them **everywhere**. But there is absolutely no need to use them. More details are [in the manual](https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS) –  Feb 25 '22 at 11:41
  • I got rid of all the double quotes but still getting error. I have updated the question – Slartibartfast Feb 25 '22 at 11:46
  • 1
    Well, you need to re-create the table without double quotes. As long as the definition was created with double quotes the names are still case-sensitive. More details [in the manual](https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS) –  Feb 25 '22 at 11:50
  • I have deleted/ recreated the table but still getting the same error. – Slartibartfast Feb 25 '22 at 11:56
  • Check the actual table names and column names (using `\d` in psql) – wildplasser Feb 25 '22 at 12:27
  • 1) The error is with the column name `Symbol` not the table name. 2) pgAdmin will automatically double quote all the column names when you create the table. So those mixed case names `Symbol`, `High`, `Volume`, etc are now `"Symbol"`, `"High"`, `"Volume"`. 3) The easiest thing to do is recreate the table with all lower case column names. – Adrian Klaver Feb 25 '22 at 15:03
  • @AdrianKlaver I have changed the code so now all the columns in the postgre database are in small case. I am getting a new error: `invalid input syntax for type bigint:` Please advise – Slartibartfast Feb 26 '22 at 04:55
  • `'166.14999389648438'` is not an integer. `select 166.14999389648438::bigint; 166` will work, but `select '166.14999389648438'::bigint; ERROR: invalid input syntax for type bigint: "166.14999389648438"` does not. This is what happens when you do not use proper parameter passing and instead use `.format()` to do string interpolation. You don't get correct type adaptation as well as open yourself to SQL injection. See [Parameters](https://www.psycopg.org/docs/usage.html#passing-parameters-to-sql-queries) for what you should be doing. – Adrian Klaver Feb 26 '22 at 05:15

1 Answers1

2

Per my comment you have two issues:

  1. You are trying to INSERT a float value(166.14999389648438) into an integer field. First thing to figure out is why the mismatch? Do really want the database field to be an integer? Second thing is that trying to force a float into an integer will work if the value is being entered as a float/numeric:

select 166.14999389648438::bigint; 166

Though as you see it gets truncated.

It will not work if entered as a string:

ERROR:  invalid input syntax for type bigint: "166.14999389648438"

Which is what you are doing. This leads to the second issue below.

  1. You are not using proper Parameter passing as shown in the link. Where among other things is the warning:

Warning

Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.

For the purposes of this question the important part is that using parameter passing will result in proper type adaptation.

Adrian Klaver
  • 15,886
  • 2
  • 17
  • 28