31

I can't do:

>>> session.query(
        func.count(distinct(Hit.ip_address, Hit.user_agent)).first()
TypeError: distinct() takes exactly 1 argument (2 given)

I can do:

session.query(
        func.count(distinct(func.concat(Hit.ip_address, Hit.user_agent))).first()

Which is fine (count of unique users in a 'pageload' db table).

This isn't correct in the general case, e.g. will give a count of 1 instead of 2 for the following table:

 col_a | col_b
----------------
  xx   |  yy
  xxy  |  y

Is there any way to generate the following SQL (which is valid in postgresql at least)?

SELECT count(distinct (col_a, col_b)) FROM my_table;
EoghanM
  • 25,161
  • 23
  • 90
  • 123

4 Answers4

24

distinct() accepts more than one argument when appended to the query object:

session.query(Hit).distinct(Hit.ip_address, Hit.user_agent).count()

It should generate something like:

SELECT count(*) AS count_1
FROM (SELECT DISTINCT ON (hit.ip_address, hit.user_agent)
hit.ip_address AS hit_ip_address, hit.user_agent AS hit_user_agent
FROM hit) AS anon_1

which is even a bit closer to what you wanted.

RedNaxel
  • 249
  • 2
  • 3
  • 3
    This generates me a distinct select on all of the columns as well, not just the added ones as parameters. – MattSom Apr 27 '20 at 11:52
  • You don't need to do query(Hit). Instead you need to do query(Hit.ip_address, Hit.user_agent), then SQLA will treat it right. – varela Aug 12 '20 at 12:32
23

The exact query can be produced using the tuple_() construct:

session.query(
    func.count(distinct(tuple_(Hit.ip_address, Hit.user_agent)))).scalar()
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
10

Looks like sqlalchemy distinct() accepts only one column or expression.

Another way around is to use group_by and count. This should be more efficient than using concat of two columns - with group by database would be able to use indexes if they do exist:

session.query(Hit.ip_address, Hit.user_agent).\
    group_by(Hit.ip_address, Hit.user_agent).count()

Generated query would still look different from what you asked about:

SELECT count(*) AS count_1 
FROM (SELECT hittable.user_agent AS hittableuser_agent, hittable.ip_address AS sometable_column2 
FROM hittable GROUP BY hittable.user_agent, hittable.ip_address) AS anon_1
vvladymyrov
  • 5,715
  • 2
  • 32
  • 50
  • Very good. Wouldn't have thought of this approach as it's a lot of typing when in SQL.. when in SQLA, it's very easy! – EoghanM May 28 '13 at 04:11
0

You can add some variables or characters in concat function in order to make it distinct. Taking your example as reference it should be:

session.query(
  func.count(distinct(func.concat(Hit.ip_address, "-", Hit.user_agent))).first()