Running Complex SQL query using Python... Is it even possible?

Question

I am trying to run a fairly complex SQL query from Python (Pandas) and am running into a question I see regularly. My SQL query is about 150 lines, and it is performing multiple tasks, accessing multiple(2) different databases, creating temp tables (6 to be exact), running updates to the tables, etc.

I don't see a way to alter my code reasonably to make this SQL script 'streamlined' and have tried to add 'NO COUNT ON', with no avail. I get the none type error.

All the examples I see are very small bits of code, has anyone solved for this for more complex queries? Aside from the length of the query, I only need to update 3 components each month - run and extract.

I am just really trying to make it a hands off process at this point.

I've been attempting to run using a stationary file location, running some quick replace methods then running the read_sql_query.

Error, and I was expecting to be able to place results into a pandas dataframe and basically copy paste to an external file.

sounds like you need to de-couple your "scripts" into repeatable processes and call them via python or an ETL tool. If you're using SQL Server, why not create a stored proc and call it using `pyodbc`? — Umar.H, Jan 27 '23 at 18:50
SQL Server was built to consume, manipulate and aggregate data. I can't imagine any gains moving to Python. Without seeing your script, I would suggest running each segment to determine the poor performing portion(s). Post those, and perhaps we can help. — John Cappelletti, Jan 27 '23 at 18:58
agree with John, I'm pretty sure your query CAN be optimised but you lack the knowledge how to do so. I'm voting to close this question as basically describing the [x,y problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) without the `SQL` code no one can really help you. — Umar.H, Jan 27 '23 at 19:04
Why not create a procedure and move the code there and just call it. — siggemannen, Jan 27 '23 at 19:52

score 0 · Answer 1 · answered Jan 27 '23 at 22:05

The key to optimizing anything is knowing what the parts cost. Look in the SQL profiler to understand how much each query costs. Moving data between servers can be pretty expensive. If you're bringing data from one database back to the panda code and then back to another database this round trip can also be very expensive.

The root of many database performance problems is moving data from one location/arrangement to another. Realizing when you're doing this is typically a key to speeding things up.

Running Complex SQL query using Python... Is it even possible?

1 Answers1