How to connect polars write_database with mssql+pyodbc?

Question

Before polars>=0.16.10 I was using .to_pandas().to_sql() to send a polars dataframe to the database. Now it should be possible to use the wrapper pl.Dataframe.write_database(), however, altough my connection_uri is working for pl.read_database() and pl.read_sql(). It doesn't work for pl.Dataframe.write_database().

This works:

connection_uri = "mssql+pyodbc://username:password@server:port/database"
pl.read_database(query, connection_uri)

# or

pl.read_sql(query, connection_uri)

This doesnt:

df = pl.Dataframe({...})

df.write_database( 
    table_name='dbo.Mytable',
    connection_uri="mssql+pyodbc://username:password@server:port/database",
    if_exists="append",
    engine="sqlalchemy"
    )

»»» sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')

So far I kept using the old method .to_pandas().to_sql(), now with the added .to_pandas(use_pyarrow_extension_array=True).to_sql() after the latest pandas update.

But I find it quite strange that the same connection_uri string works for read_database but not for write_database.

score 1 · Accepted Answer · answered Mar 29 '23 at 08:54

When you use to_pandas().to_sql() what parameters are you putting into the to_sql()?

From the docs, it would seem you must be feeding it a sqlalchemy engine. Next question, how are you initiating the engine? Is it with the same connection string or is it a different syntax?

The source code of the pl.DataFrame.write_database method (the relevant part anyway) is this:

try:
    from sqlalchemy import create_engine
except ImportError as exc:
    raise ImportError(
        "'sqlalchemy' not found. Install polars with 'pip install polars[sqlalchemy]'."
    ) from exc

engine = create_engine(connection_uri)

# this conversion to pandas as zero-copy
# so we can utilize their sql utils for free
self.to_pandas(use_pyarrow_extension_array=True).to_sql(
    name=table_name, con=engine, if_exists=if_exists, index=False
)

So whatever connection string you're using to create the sqlalchemy engine you're using for the pandas to_sql must work for the polars write_database because all it does is use the pandas to_sql.

So, the connection I use for `pandas.to_sql()` and `pandas.read_sql()` is different from the one I use when using polars.read_database() which is based on polars Docs. After trying, like you suggest, to use the same connection str as I use for pandas, it worked! Albeit I have noticed this to be quite significantly slower, in my cases, than the previous .to_pandas().to_sql() without the `use_pyarrow_extension_array=True` — gilnribeiro, Mar 29 '23 at 11:12

How to connect polars write_database with mssql+pyodbc?

1 Answers1

Linked