0

I have 3 lists of user id's and time ranges (different for each user id) for which I would like to extract data. I am querying an AWS redshift database through Python. Normally, with one list, I'd do something like this:

sql_query = "select userid from some_table where userid in {}".format(list_of_users)

where list of users is the list of user id's I want - say (1,2,3...)

This works fine, but now I need to somehow pass it along a triplet of (userid, lower time bound, upper time bound). So for example ((1,'2018-01-01','2018-01-14'),(2,'2018-12-23','2018-12-25'),...

I tried various versions of this basic query

sql_query = "select userid from some_table where userid in {} and date between {} and {}".format(list_of_users, list_of_dates_lower_bound, list_of_dates_upper_bound)

but no matter how I structure the lists in format(), it doesn't work. I am not sure this is even possible this way or if I should just loop over my lists and call the query repeatedly for each triplet?

user3490622
  • 939
  • 2
  • 11
  • 30
  • does `between` even work with lists in sql? – SuperStew May 22 '19 at 16:57
  • What package are using to connect to redshift? If `psycopg2` check out this QA https://stackoverflow.com/questions/8671702/passing-list-of-parameters-to-sql-in-psycopg2 which details how to populate a SQL `IN` expression. However, you should note that required type is a tuple, not a list. However, as others have pointed out in the comments, passing multiple elements into either side of the `BETWEEN` expression (as list, tuple, or otherwise) is nonsensical. Consider executing multiple `SELECT` statements, possibly concatenated by some kind of `UNION`. – xibalba1 May 22 '19 at 17:20

2 Answers2

1

suppose the list of values are something like following:

list_of_users = [1,2], 
list_of_dates_lower_bound = ['2018-01-01', '2018-12-23']
list_of_dates_lower_bound = ['2018-01-14', '2018-12-25']

the formatted sql would be:

select userid from some_table where userid in [1,2] and date between ['2018-01-01', '2018-12-23'] and ['2018-01-14', '2018-12-25']

This result should not be what you thought as is, it's just an invalid sql, the operand of between should be scalar value.

I suggest loop over the lists, and pass a single value to the placeholder.

Steven
  • 76
  • 1
  • 4
0

You can select within a particular range by using

select col from table where col between range and range;

In your case it may be

select userid from some_table where date_from between yesterday and today;

or even

select userid from some_table where date_from >= yesterday and date_from <= today;
python_starter
  • 1,509
  • 1
  • 11
  • 18