0

I'm trying to pull a single value back from a MSSQL DB, the code looks like:

import pandas as pd
import pypyodbc
query = 'SELECT TOP(1) value FROM [dbo].[table]'
connection = pypyodbc.connect(...)
df = pd.read_sql_query(query, connection)

But this returns a dataframe object, is there a method to instead just return a single string value, possibly without Pandas. Key priority being speed. Is there a quicker way generally of connecting/querying?

Bonifacio2
  • 3,405
  • 6
  • 34
  • 54
Zach Cleary
  • 458
  • 1
  • 4
  • 17
  • 1
    A few things in your question are not clear to me. 1) There is no "WHERE" clause - don't you care about what single row to select? 2) Why speed is important? It looks like you want to use this approach in a loop - am i right? Can you explain bit more about what you want to achieve? – MaxU - stand with Ukraine Jul 26 '17 at 10:49
  • Sorry, no WHERE for brevity, I'm not concerned by the SQL syntax just connection. Yes, it will be used in a while loop to constantly acquire new values according to an algorithm. – Zach Cleary Jul 26 '17 at 10:59
  • In this case I would suggest you to put more details about your algorithm and post a small reproducible data set (in CSV/TEXT form) and your desired data set in the question. Most probably there is a way to implement your algorithm in __vectorized__ Pandas/Numpy/SciPy/SKLearn way. In this case you will not need to loop through your data row by row - you will read all data (or a chunk that fits in memory) and process the whole data set. – MaxU - stand with Ukraine Jul 26 '17 at 11:06

1 Answers1

1

Consider implementing your algorithm in a vectorized Pandas/NumPy/SciPy/SKLearn way if processing speed is important for you.

Using vectorized approach usually means using internal functions that are working with vectors and matrices instead of scalars and that are implemented with C, C-Python, etc. (optimized) instead of writing loops.

If your algorithm can't be vectorized you still can speed up your algorithm - read all the data you want to process at once instead of doing it in a loop:

query = 'SELECT value FROM [dbo].[table]'  # <-- NOTE: i have intentionally removed `TOP(1)`
connection = pypyodbc.connect(...)
df = pd.read_sql_query(query, connection)

# process your data (from the DataFrame) here...
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419