2

I am trying to read data from a SQL Server database into a Polars DataFrame using Python. I have successfully used the pandas read_sql() method with a connection string in the past, but I am having trouble finding documentation on how to do this with Polars.

I am using Python and have installed the latest versions of Polars, Pandas, Connectorx, and PyArrow. I have created a connection string to my SQL Server database and successfully executed a SQL query using pandas read_sql() method to get a pandas DataFrame. I then attempted to convert this pandas DataFrame to a Polars DataFrame using the from_pandas() method, but I am getting the following error message:

"ModuleNotFoundError: pd.Series requires 'pandas' module to be installed"

I am confused because I have confirmed that all the necessary dependencies are installed, including Pandas. I am not sure what is causing this error or how to fix it.

import pyodbc
import polars as pl
from sqlalchemy import create_engine
from sqlalchemy.engine.url import URL
import pandas as pd

# define connection string
conn_str = (
    r"DRIVER={SQL Server};"
    r"SERVER=PLA1SQL01\AAMGRID1PRD;"
    r"DATABASE=NIER;"
    r"Trusted_Connection=yes;"
)

# create pyodbc connection
conn = pyodbc.connect(conn_str)

query = '''
SELECT DISTINCT TOP 4
    rrs.Name as 'Risk Run Name',
    rds.id as RiskDataSetID
FROM nier..RiskDataSet rds
LEFT JOIN nier..RiskResultSet rrs ON rds.id = rrs.RiskDataSetID
WHERE ismonthly = 0
AND ParentID IS NULL
AND NoLossForCMLCorp = 1
ORDER BY 2 DESC
'''

df = pd.read_sql(query, conn)
pl_df = pl.from_pandas(df)

I expected the code to successfully convert the pandas DataFrame into a Polars DataFrame, but I received the "ModuleNotFoundError" instead. Any clue on how to read sql query from ms sql server using polars?

import polars
import pyarrow
import pandas
import connectorx
import pyodbc

pl.show_versions()

---Version info---
Polars: 0.16.16
Index type: UInt32
Platform: Windows-10-10.0.19044-SP0
Python: 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)]
---Optional dependencies---
numpy: 1.24.2
pandas: 1.5.3
pyarrow: 11.0.0
connectorx: 0.3.1
deltalake: <not installed>
fsspec: <not installed>
matplotlib: <not installed>
xlsx2csv: <not installed>
xlsxwriter: <not installed>
db0
  • 21
  • 2
  • The error message has nothing to do with database, that's an internal polars / pandas thing. Btw, you are using ms sql server, not mysql. – Shadow Mar 29 '23 at 23:30
  • @Shadow Apologies, i updated for clarification and added package dependencies. Let me know if theres anything else i can help provide. Thanks!! – db0 Mar 29 '23 at 23:44
  • did you try installing the module that wasn't found? – Paul H Mar 29 '23 at 23:58
  • Does `pandas` appear in the output of `pl.show_versions()`? Have you tried `pl.read_sql()` directly? https://stackoverflow.com/a/75096455 – jqurious Mar 30 '23 at 00:14
  • Okay, so pandas is there and polars detects it, which suggests this is a "bug" of some sort. [`pl.read_database`](https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.read_database.html#polars-read-database) should be able to connect to `mssql` if you have [connectorx installed](https://sfu-db.github.io/connector-x/databases/mssql.html?highlight=mssql). (`.read_sql` has been renamed apparently) – jqurious Mar 30 '23 at 00:40
  • @jqurious I do have connectorx==0.3.1. I tried the pl.read_database method but ran into an AttributeError: 'pyodbc.Connection' object has no attribute 'split'. Afterwards, I tried to force my connection object to a string and i ran into a ValueError: not enough values to unpack (expected 2, got 1). This approach apparently worked for this guy: https://stackoverflow.com/questions/74776382/can-you-use-connectorx-with-sql-server – db0 Mar 30 '23 at 01:11
  • Yes, https://stackoverflow.com/questions/75869526/how-to-connect-polars-write-database-with-mssqlpyodbc is another person with it working (but asking for help with writing). It's very strange you're getting all of these weird errors. All I can suggest is perhaps [filing a bug](https://github.com/pola-rs/polars/issues/) if you get no further responses here. – jqurious Mar 30 '23 at 16:09
  • 1
    Note: to use `connectorx` you'd need to pass a string URI, not a python/pyodbc connection (as it will use Rust drivers to establish the connection itself), eg: something like `mssql://user:password@myserver:port/mydatabase?trusted_connection=true&encrypt=true`. – alexander-beedie Mar 30 '23 at 20:43
  • "but I received the "ModuleNotFoundError" instead."---- you need to be more specific. What line triggered that? what was the full error message? – Dean MacGregor Mar 31 '23 at 11:23

0 Answers0