1

I have a SQL Server on which I have databases that I want to use pandas to alter that data. I know how to get the data using pyodbc into a DataFrame, but then I have no clue how to get that DataFrame back into my SQL Server.

I have tried to create an engine with sqlalchemy and use the to_sql command, but I can not get that to work because my engine is never able to connect correctly to my database.

import pyodbc
import pandas
server = "server"
db = "db"
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+db+';Trusted_Connection=yes')
cursor = conn.cursor()
df = cursor.fetchall()
data = pandas.DataFrame(df)
conn.commit()
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
akerr
  • 11
  • 1
  • 5

1 Answers1

2

You can use pandas.DataFrame.to_sql to insert your dataframe into SQL server. Databases supported by SQLAlchemy are supported by this method.

Here is a example how you can achieve this:

from sqlalchemy import create_engine, event
from urllib.parse import quote_plus
import logging
import sys
import numpy as np
from datetime import datetime, timedelta

# setup logging
logging.basicConfig(stream=sys.stdout, 
                filemode='a', 
                format='%(asctime)s.%(msecs)3d %(levelname)s:%(name)s: %(message)s', 
                datefmt='%m-%d-%Y %H:%M:%S', 
                level=logging.DEBUG)
logger = logging.getLogger(__name__)    # get the name of the module

def write_to_db(df, database_name, table_name):
    """
    Creates a sqlalchemy engine and write the dataframe to database
    """
    # replacing infinity by nan
    df = df.replace([np.inf, -np.inf], np.nan)

    user_name = 'USERNAME'
    pwd = 'PASSWORD' 
    db_addr = '10.00.000.10'
    chunk_size = 40 

    conn =  "DRIVER={SQL     Server};SERVER="+db_addr+";DATABASE="+database_name+";UID="+user_name+";PWD="+pwd+""
    quoted = quote_plus(conn)
    new_con = 'mssql+pyodbc:///?odbc_connect={}'.format(quoted)

    # create sqlalchemy engine
    engine = create_engine(new_con)

    # Write to DB
    logger.info("Writing to database ...")
    st = datetime.now() # start time
    # WARNING!! -- overwrites the table using if_exists='replace'
    df.to_sql(table_name, engine, if_exists='replace', index=False, chunksize=chunk_size)
    logger.info("Database updated...")
    logger.info("Data written to '{}' databsae into '{}' table ...".format(database_name, table_name))
    logger.info("Time taken to write to DB: {}".format((datetime.now()-st).total_seconds()))

Calling this method should write your dataframe to the database, note that it will replace the table if there is already a table in the database with the same name.

Community
  • 1
  • 1
rzskhr
  • 931
  • 11
  • 10
  • Should I just be able to copy paste this def and use it just entering the three variables? – akerr Apr 18 '19 at 17:04
  • This is the error I get: (pyodbc.InterfaceError) ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect); [IM002] [Microsoft][ODBC Driver Manager] Invalid connection string attribute (0)') (Background on this error at: http://sqlalche.me/e/rvf5) – akerr Apr 18 '19 at 17:04
  • I am not sure how your database is setup. You can use the exact same function and this should work. You would need to change the username, password and ip address of the database in the function, and then pass the three variables. – rzskhr Apr 18 '19 at 17:07
  • I know what my Username and password are but how do I know the IP address of the database? Also with the logging part of your code I tried to use that and I got an error saying that "sys" is not defined – akerr Apr 18 '19 at 17:10
  • You can get the ip address of the database by pinging using the command line. I forgot to import sys in the code, I changed that now. – rzskhr Apr 18 '19 at 17:12
  • I found it, but i got an error: name 'np' is not defined – akerr Apr 18 '19 at 17:40
  • ahhhh ..... import issue, forgot to import numpy. just add `import numpy as np`. That line is replacing the infinity values by nan, you can comment that line if your dataframe doesn't have infinity values. – rzskhr Apr 18 '19 at 17:43
  • We will get this one of these times, but I also had to import datetime, but I get an error: module 'datetime' has no attribute 'now' – akerr Apr 18 '19 at 18:01
  • `from datetime import datetime, timedelta` – rzskhr Apr 18 '19 at 18:04
  • I got a much bigger error this time: InterfaceError: (pyodbc.InterfaceError) ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)') (Background on this error at: http://sqlalche.me/e/rvf5) – akerr Apr 18 '19 at 18:08