Consider the following example table
x_id | name_id1 | name_id2 |
---|---|---|
x1 | John | Frank |
x2 | Frank | John |
x3 | Jack | John |
x4 | John | Jack |
x5 | Bob | Frank |
x6 | George | Bob |
x7 | Bob | Finn |
x8 | Mark | James |
x9 | James | Finn |
The goal is to extract the following result
name | frequency |
---|---|
John | 4 |
Bob | 3 |
James | 2 |
Conceptually this is the result of the following procedure:
- Count the frequency of all names and pick the one that occurs most often, that would be John, which occurs 4 times (row
x1
tox4
). - Now remove all rows that contain John, which leaves you with row
x5
tox9
. Again determine the name that occurs most frequently. This gives you Bob, which occurs 3 times (rowx5
tox7
). - Now also remove row
x5
tox7
, which leaves us with rowx8
tox9
. Again determine the name that occurs most frequently. This gives us James which occurs 2 times. - Now also remove row
x8
tox9
, which leaves us with nothing, so we are done.
This data is stored in SQLite in a junction table as follows (in the real world case there can be more than two names per x_id
)
id | x_id | name_id |
---|---|---|
1 | x1 | John |
2 | x1 | Frank |
3 | x2 | John |
4 | x2 | Frank |
5 | x3 | John |
6 | x3 | Jack |
7 | x4 | John |
8 | x4 | Jack |
9 | x5 | Bob |
10 | x5 | Frank |
11 | x6 | Bob |
12 | x6 | George |
13 | x7 | Bob |
14 | x7 | Finn |
13 | x8 | James |
14 | x8 | Mark |
13 | x9 | James |
14 | x9 | Finn |
What kind of procedure do we need to retrieve the described result? Considering that the above junction table is of variable length (just to make sure we do not come up with a fixed amount of join
s as a valid answer).
I did consider using the WITH RECURSIVE
approach however that will not allow us to
- Execute the aggregate function
COUNT
in the recursive select which seems to be required when we want to calculate the occurrences. - Remove all previous
x_id
s, only those that are currently in queue.