0

I've been stuck on this for a few days now. An event can have multiple dates, and I want the query to only return the date closest to today (the next date). I have considered querying for Events and then adding a hybrid property to Event that returns the next Event Date but I believe this won't work out (such as if I want to query EventDates in a certain range).

I'm having a problem with distinct() not working as I would expect. Keep in mind I'm not a SQL expert. Also, I'm using postgres.

My query starts like this:

    distance_expression = func.ST_Distance(
        cast(EventLocation.geo, Geography(srid=4326)),
        cast("SRID=4326;POINT(%f %f)" % (lng, lat), Geography(srid=4326)),
    )

    query = (
        db.session.query(EventDate)
        .populate_existing()
        .options(
            with_expression(
                EventDate.distance,
                distance_expression,
            )
        )
        .join(Event, EventDate.event_id == Event.id)
        .join(EventLocation, EventDate.location_id == EventLocation.id)
    )

And then I have multiple filters (just showing a few for as an example)

query = query.filter(EventDate.start >= datetime.utcnow)
if kwargs.get("locality_id", None) is not None:
    query = query.filter(EventLocation.locality_id == kwargs.pop("locality_id"))
if kwargs.get("region_id", None) is not None:
    query = query.filter(EventLocation.region_id == kwargs.pop("region_id"))
if kwargs.get("country_id", None) is not None:
    query = query.filter(EventLocation.country_id == kwargs.pop("country_id"))

Then I want to order by date and distance (using my query expression)

        query = query.order_by(
            EventDate.start.asc(),
            distance_expression.asc(),
        )

And finally I want to get distinct rows, and only return the next EventDate of an event, according to the ordering in the code block above.

        query = query.distinct(Event.id)

The problem is that this doesn't work and I get a database error. This is what the generated SQL looks like:

SELECT DISTINCT ON (events.id) ST_Distance(CAST(event_locations.geo AS geography(GEOMETRY,4326)), CAST(ST_GeogFromText(%(param_1)s) AS geography(GEOMETRY,4326))) AS "ST_Distance_1", event_dates.id AS event_dates_id, event_dates.created_at AS event_dates_created_at, event_dates.event_id AS event_dates_event_id, event_dates.tz AS event_dates_tz, event_dates.start AS event_dates_start, event_dates."end" AS event_dates_end, event_dates.start_naive AS event_dates_start_naive, event_dates.end_naive AS event_dates_end_naive, event_dates.location_id AS event_dates_location_id, event_dates.description AS event_dates_description, event_dates.description_attribute AS event_dates_description_attribute, event_dates.url AS event_dates_url, event_dates.ticket_url AS event_dates_ticket_url, event_dates.cancelled AS event_dates_cancelled, event_dates.size AS event_dates_size 
FROM event_dates JOIN events ON event_dates.event_id = events.id JOIN event_locations ON event_dates.location_id = event_locations.id 
WHERE events.hidden = false AND event_dates.start >= %(start_1)s AND (event_locations.lat BETWEEN %(lat_1)s AND %(lat_2)s OR false) AND (event_locations.lng BETWEEN %(lng_1)s AND %(lng_2)s OR false) AND ST_DWithin(CAST(event_locations.geo AS geography(GEOMETRY,4326)), CAST(ST_GeogFromText(%(param_2)s) AS geography(GEOMETRY,4326)), %(ST_DWithin_1)s) ORDER BY event_dates.start ASC, ST_Distance(CAST(event_locations.geo AS geography(GEOMETRY,4326)), CAST(ST_GeogFromText(%(param_3)s) AS geography(GEOMETRY,4326))) ASC

I've tried a lot of different things and orderings but I can't work this out. I've also tried to create a subquery at the end using from_self() but it doesn't keep the ordering.

Any help would be much appreciated!

EDIT:

On further experimentation it seems that I can't use order_by will only work if it's ordering the same field that I'm using for distinct(). So

query = query.order_by(EventDate.event_id).distinct(EventDate.event_id)

will work, but

query.order_by(EventDate.start).distinct(EventDate.event_id)

will not :/

  • If you are trying to use the DISTINCT ON...ORDER BY greatest-n-per-group approach, order by event id, then start date (or what ever is your "greatest"). That gets you the row per event id you want. – Ilja Everilä Jan 03 '22 at 01:32
  • 1
    Have you tried initially using the ORDER BY that is needed to get the DISTINCT ON to work, then using from_self to impose the final ordering you want? You said from_self doesn't keep the ordering. It should be *imposing* the ordering it wants. – jjanes Jan 03 '22 at 01:50
  • @jjanes I did try that but I want to order the query first (with EventDate.start) and then get the distinct EventDate.event_id rows based on that ordering. This seems to be impossible. But I've found a workaround, see my answer below if you're curious. – user6549570 Jan 03 '22 at 07:57

1 Answers1

0

I solved this by using adding a row_number column and then filtering by the first row numbers like in this answer:

filter by row_number in sqlalchemy