0

I am attempting to generate an INSERT statement into an SQL Server database from a Data Frame:

data_frame.to_sql(table, engine, if_exists='append', index=False)

The Data Frame is created from a JSON Response.

I am having issues inserting String values with single quotes in them into the database. The reason appears to be that the Data Frame fields are double quoted if there is a single quote anywhere in the field. Example of the behavior:

INSERT INTO my_table(column1, column2, column3) VALUES ('value1', "value2's", 'value3');

Because value2's has a single quote in it, .to_sql() surrounds the field with double quotes instead. However, SQL Server believes this value to be a column instead because it's double quoted and the insert fails.

How can I get .to_sql() to instead format the INSERT statement like below:

INSERT INTO my_table(column1, column2, column3) VALUES ('value1', 'value2''s', 'value3');

The Error message .to_sql() generates:

The identifier that starts with 'value2's' is too long. Maximum length is 128.

The error message above suggests that the database is treating the Data Frame value as a column name. SQL Server often gives this error if a value is double quoted instead of single quoted.

Libraries (some may not be relevant to this issue):

import pyodbc
import sqlalchemy
from sqlalchemy import MetaData
import pandas as pd
from sqlalchemy.engine import URL

Any help is appreciated!

cpa390
  • 13
  • 7
  • Please post the exact error message and show your libraries and definitions of objects for a [mcve]. Pandas' `to_sql` parameterizes values so quotes cannot raise issues. – Parfait Aug 27 '22 at 00:09
  • @Parfait I've edited my question with some clarifications. Thanks! – cpa390 Aug 27 '22 at 00:58
  • Please post actual code block and sample data for reproducibility. We appreciate your interpretation of issue but need to see your actual implementation. Single or special characters within columns should not affect insertion with pyodbc and sqlalchemy. – Parfait Aug 27 '22 at 12:54

2 Answers2

2

How have you determined that the error message applies to the df.to_sql() statement?

Consider the following example:

import os
import pandas as pd
from sqlalchemy import Column, MetaData, Table, VARCHAR, create_engine
import urllib

# Connection setup...
conn_params = urllib.parse.quote_plus(
    "Driver=ODBC Driver 18 for SQL Server;" +
    "Server=tcp:%s,%s;" % (os.getenv("DB_SERVER"), os.getenv("DB_PORT")) +
    "Database=%s;" % os.getenv("DB_NAME") +
    "Uid=%s;" % os.getenv("DB_USER") +
    "Pwd={%s};" % os.getenv("DB_PASSWORD") +
    "Encrypt=yes;" +
    "TrustServerCertificate=yes;"
)
url = f"mssql+pyodbc:///?odbc_connect={conn_params}"
engine = create_engine(
    url,
    isolation_level="REPEATABLE READ"
)

# Database setup...
m = MetaData()
t = Table(
    "my_table",
    m,
    Column("column1", VARCHAR(20)),
    Column("column2", VARCHAR(20)),
    Column("column3", VARCHAR(20)),
    )
m.create_all(engine)

# Dataframe setup...
df = pd.DataFrame(data={
    "column1": [1, 2, 3],
    "column2": ["Sam", "Daniel", "Jack"],
    "column3": ["Carter", "Jackson", "O'Neill"]
})

# Data insertion...
with engine.begin() as conn:
    conn.execute(t.delete())
    engine.echo = True
    df.to_sql(t.name, engine, if_exists="append", index=False)
    for row in conn.execute(t.select()).fetchall():
        print(row)

Executing this on the console gives us the following output:

# python3 demo.py
2022-08-27 05:43:11,069 INFO sqlalchemy.engine.Engine SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] 
FROM [INFORMATION_SCHEMA].[TABLES] 
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max))
2022-08-27 05:43:11,069 INFO sqlalchemy.engine.Engine [cached since 0.01563s ago] ('BASE TABLE', 'my_table', 'dbo')
2022-08-27 05:43:11,074 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-08-27 05:43:11,076 INFO sqlalchemy.engine.Engine INSERT INTO my_table (column1, column2, column3) VALUES (?, ?, ?)
2022-08-27 05:43:11,076 INFO sqlalchemy.engine.Engine [generated in 0.00060s] ((1, 'Sam', 'Carter'), (2, 'Daniel', 'Jackson'), (3, 'Jack', "O'Neill"))
2022-08-27 05:43:11,079 INFO sqlalchemy.engine.Engine COMMIT
('1', 'Sam', 'Carter')
('2', 'Daniel', 'Jackson')
('3', 'Jack', "O'Neill")

Notice that the O'Neill string has been wrapped in double quotes, i.e. "O'Neill", but this is purely SQLAlchemy's echo presentation. The actual data has been inserted into the table correctly, as shown with sqlcmd:

# /opt/mssql-tools18/bin/sqlcmd \
        -S $DB_SERVER \
        -U $DB_USER \
        -P $DB_PASSWORD \
        -C \
        -d $DB_NAME \
        -Q "select * from dbo.my_table"
column1              column2              column3             
-------------------- -------------------- --------------------
1                    Sam                  Carter              
2                    Daniel               Jackson             
3                    Jack                 O'Neill             

(3 rows affected)
AlwaysLearning
  • 7,915
  • 5
  • 27
  • 35
  • Additionally, you show insertion runs parameterization: `INSERT INTO my_table (column1, column2, column3) VALUES (?, ?, ?)`. So quotes in variables should not break SQL syntax. – Parfait Aug 27 '22 at 12:57
0

I think the answer provided by "AlwaysLearning" is correct. This is not strictly an answer, but an alternative way to approach the same problem, i.e. DataFrame to ODBC database.

from arrow_odbc import insert_into_table
import pyarrow as pa
import pandas


def dataframe_to_table(df):
    table = pa.Table.from_pandas(df)
    reader = pa.RecordBatchReader.from_batches(table.schema, table.to_batches())
    insert_into_table(
        connection_string=connection_string,
        user="user",
        password="password",
        chunk_size=100,
        table="my_table",
        reader=reader,
    )

Full disclosure: I am the author of arrow-odbc so my views may be biased.

Markus Klein
  • 1,202
  • 12
  • 10