1

There are 2 tables:

  • challenge
  • entry (many entries per each challenge)

To enter a challenge, one would enter their Email and Name (for the leaderboard) - there is no registration / authentication process, so email and leaderboard_name are both recorded for each entry right in the entry table:

CREATE TABLE public.entry
(
    id bigint NOT NULL DEFAULT nextval('entry_id_seq'::regclass),
    challenge_id bigint NOT NULL,
    date_created timestamp with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
    email text COLLATE pg_catalog."default" NOT NULL,
    is_removed boolean NOT NULL DEFAULT false,
    total_points integer NOT NULL DEFAULT 0,
    CONSTRAINT entry_pkey PRIMARY KEY (id)
)

With this approach, the same email value can be used multiple times with a single challenge. In other words, multiple entries with the same email can be created.

Now, the query to get distinct emails with the highest score is quite simple:

SELECT DISTINCT ON (email)
  id as entry_id, total_points
FROM  entry
WHERE challenge_id = 1
  AND is_removed = false
  AND total_points > 0
ORDER BY email, total_points DESC, date_created;

Here, we don't want entries that:

  • a) were removed by admins (is_removed = false)
  • b) have zero score (total_points > 0)

The challenge_id = 1 is also essential as it limits the entries to a particular challenge by its ID.

Also, ordering by date_created picks the earliest entry for each email if there are many with the same score. This makes sure the entry selected per email is always the same.

The issue arises when I try to build an index that would be used by the query planner.

I tried creating indexes on:

  • 3 columns: email, total_points DESC, date_created
  • 5 columns: challenge_id, is_removed, email, total_points DESC, date_created

But EXPLAIN keeps returning Seq Scan for the entry table.


Bonus question - how to effectively select only Top N entries by total_points DESC, date_created ASC? The simplest way would be to wrap this query with another query, but are there any better ways?

Meglio
  • 1,646
  • 2
  • 17
  • 33
  • Please **[edit]** your question and add the [execution plan](https://www.postgresql.org/docs/current/static/using-explain.html) generated using **`explain (analyze, buffers, format text)`** (_not_ just a "simple" explain) as [formatted text](http://stackoverflow.com/help/formatting), [no screen shots](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557) please. Or upload the plan to https://explain.depesz.com –  Sep 18 '19 at 06:56
  • An filtered index might help, e.g. `on ... ( email, total_points DESC, date_created) where is removed = false and total_points > 0)` –  Sep 18 '19 at 06:57
  • The index on `(challenge_id, is_removed, email, total_points DESC, date_created)` works for me. – jjanes Sep 18 '19 at 20:05

0 Answers0