0

I am trying to build a SQL that needs to be filtered by two parameters (2 columns), and the second column needs to match multiple values.

Given below is the SQL I have build thus far (thanks for the help of Martijn Pieters)

import psycopg2
import pandas as pd
import datetime

# Connecting to db

con = psycopg2.connect(db_details)
cur = con.cursor()
cur.execute("select * from sales limit 10")
rows = cur.fetchall()

params = {'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
      'store_name': 'store_1', 'store_2'}

df = pd.read_sql("""
     select store_name,count(*) from sales 
     where created_at >= %(earliest)s
     and store_name = %(store_name)s""",
 params=params, con=con)

The above SQL has one date parameter which is used in the where clause and I added one more parameter namely the store_name, where rows match either one of two values.

Would like to know how could I add in this additional parameter to the existing query.

I tried to create a the parameter (similar to the date filter) and pass that to the existing query but get a syntax error when I give it two values:

    'store_name': 'store_1', 'store_2'}
                                      ^
SyntaxError: invalid syntax

pointing to the params field.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Kevin Nash
  • 1,511
  • 3
  • 18
  • 37
  • You have a simple syntax error because you do need to *at least* get the basics of the language learned a bit better. You are trying to run when you first need to properly walk. It would also be helpful if you picked up more fundamentals about SQL too; you can't just pass in two values for a single parameter because SQL would use a different kind of operator to test for those values, instead of `store_name = ...` for a single value, you'd have to use `store_name in (, , ..., )`. I was hoping your question would really just focus on that latter problem. – Martijn Pieters Sep 13 '18 at 11:15
  • In future, you do need to **start** your question with the error output included, because with that attempt to define `store_name` at least people trying to answer your question will see this is a `columnname IN (value, value, value)` test, not a straight single-value `columnname = value` test. A SQL `IN` test with parameters requires extra work from the Python programmer because now you have to generate SQL text for those parameters. – Martijn Pieters Sep 13 '18 at 11:17
  • Had you just set `'store_name': 'store_1'` (so a single string value), the code would have worked as is. So the real problem here is not defining parameters, it is defining parameters with more than one value. – Martijn Pieters Sep 13 '18 at 11:22
  • Pretty close to being a dupe of [Python/psycopg2 WHERE IN statement](https://stackoverflow.com/questions/28117576/python-psycopg2-where-in-statement), at least as far as the goal of the whole excercise is concerned. – Ilja Everilä Sep 13 '18 at 11:33

2 Answers2

2

You have two problems:

  • You used invalid Python syntax; the comma in a dictionary separates key-value pairs, so the 'store_2' string would be another key-value pair but is missing the : value parts. If you want to define a value with more than one string, you'd have to use a tuple or a list there, were you explicitly use either (...) or [...] to separate that syntax from the key: value, key: value notation:

    params = {
        'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
        'store_name': ('store_1', 'store_2'),  # tuple with two values
    }
    
  • Generally speaking, SQL parameters can only work with single values. The store_name parameter can only be given a single value, not a sequence of values. That's because SQL parameters are a bridge between the SQL query and the dynamic values to be used in that query, with parameters designed to act as placeholder for each individual dynamic value.

    That said, the psycopg2 library specifically supports tuples, this is an exception to most Python database libraries however.

    Next, if you want to filter rows on matching either 'store_1' or 'store_2', the correct SQL syntax would be to use two store_name = ... tests with OR between them and parentheses around (to keep that part separate from the date test connected with AND to the store name test), or by using store_name IN ('store_1', 'store_2'). An IN test compares a column name against multiple values listed in the (...) parentheses.

Given that you are using psycopg2 here, you can get away with the store_name key referencing a tuple value, but you do need to use IN for your query:

params = {
    'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
    'store_name': ('store_1', 'store_2')
}

df = pd.read_sql("""
     SELECT store_name, count(*) FROM sales 
     WHERE created_at >= %(earliest)s
     AND store_name IN %(store_name)s""",
     params=params, con=con)

On a separate note: the pd.read_sql() function [explicitly states that only sqlite is supported when using a DBAPI connection](If a DBAPI2 object, only sqlite3 is supported):

If a DBAPI2 object, only sqlite3 is supported.

You are using such an object; most Python database adapters are DBAPI2 libraries; DBAPI2 is a Python standard for such libraries.

You should really use a SQLAlchemy connection string instead. Your code happens to work because you never attempt to write any data back to the database and the psycopg connection and cursor objects are largely compatible with the sqlite3 library versions, but you could run into problems down the road.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • In case of psycopg it is possible to pass multiple values with a single placeholder. Tuples and lists have special adaptations, of which tuples adaptation is useful with IN predicates and ROW constructors. – Ilja Everilä Sep 13 '18 at 13:50
  • @IljaEverilä: That's actually news to me! And [indeed, this is documented](http://initd.org/psycopg/docs/usage.html#tuples-adaptation). That does help here. – Martijn Pieters Sep 13 '18 at 14:10
  • Some of the MySQL drivers have similar features as well, though I think they were either vaguely documented or not at all. – Ilja Everilä Sep 13 '18 at 14:13
  • 1
    @IljaEverilä: this exposes a different issue: passing in a DBAPI2 connection *may* work, but only SQLite3 is actually supported, and there may well be issues with type conversions. – Martijn Pieters Sep 13 '18 at 14:36
-1

I don't see why this would not work :

params = {'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
          'store_name': '<put what you want here>'}

df = pd.read_sql("""
         select store_name,count(*) from sales 
         where created_at >= %(earliest)s
         and store_name = %(store_name)s""",
     params=params, con=con)

Because you want two stores this is a bit more complex.

This should work :

params = {'earliest': datetime.datetime.today() - datetime.timedelta(days=7),
          'store_names': ','.join(('store_1', 'store_2'))}

df = pd.read_sql("""
         select store_name,count(*) from sales 
         where created_at >= %(earliest)s
         and store_name in (%(store_names)s)""",
     params=params, con=con)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Corentin Limier
  • 4,946
  • 1
  • 13
  • 24
  • thanks I am having issue passing in more than store_name in the params. Could you please assist. – Kevin Nash Sep 13 '18 at 11:02
  • @KevinNash Could you please edit your question and show the code that you've tried and the error message ? – Corentin Limier Sep 13 '18 at 11:04
  • I have updated my initial message with the change that I have done along with the syntax error that it throws. It works fine if I pass just one store name – Kevin Nash Sep 13 '18 at 11:07
  • You can't use a single parameter for `IN`. You need to generate **separate parameters** for each of the values. That's because SQL parameters can only handle **one** value, and you don't want that value to include the commas. – Martijn Pieters Sep 13 '18 at 11:17
  • 1
    You are asking for rows where the `store_name` column has a value of `'store_1,store_2'`, not for rows where the `store_name` value is either `'store_1'` *or* `'store_2'`. – Martijn Pieters Sep 13 '18 at 11:18
  • As it turns out, this *does* work, because [psycopg2 explicitly supports this](http://initd.org/psycopg/docs/usage.html#tuples-adaptation). That is, **provided Pandas passes the query on directly**. – Martijn Pieters Sep 13 '18 at 14:11
  • Ah, but there's the rub, if `con` is a dbapi2 connection, then [only sqlite is supported](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html). The database connection is wrapped in a `SQLiteDatabase` class. – Martijn Pieters Sep 13 '18 at 14:34