Efficiently aggregating with joined table and searching

Question

I have the following table schemas on Postgres 13.4:

TBL1: id, name
TBL2: fk(TBL1), name

and I want to make a query such that given query will be searched both on TBL1's id and name, also TBL2's name column (any of these 3 match), while joining TBL2 name columns in a single column grouped by TBL1 id, if they exist. If not I want to have TBL1's name.

TBL1:

id	name
1	john
2	jane
3	kane

TBL2:

fk	name
1	doe
1	foo
3	joe

should return the result with searching term '%j%'

id	display_name
1	doe, foo (because id matches '%j%')
2	jane (because id matches '%j%')
3	joe (because TBL2->name matches '%j%')

Current query:

SELECT
    tbl1.id as id,
    COALESCE(string_agg(tbl2.name, ','), tbl1.id) as display_name
FROM tbl1
LEFT JOIN tbl2 vm on tbl2.fk = tbl1.id
WHERE (tbl1.id ILIKE '%query%' OR tbl2.name LIKE '%query%')
GROUP BY id;

Plan:

 [
  {
    "Plan": {
      "Node Type": "Aggregate",
      "Strategy": "Sorted",
      "Partial Mode": "Simple",
      "Parallel Aware": false,
      "Startup Cost": 595.05,
      "Total Cost": 595.53,
      "Plan Rows": 24,
      "Plan Width": 129,
      "Group Key": ["tbl1.\"id\""],
      "Plans": [
        {
          "Node Type": "Sort",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Startup Cost": 595.05,
          "Total Cost": 595.11,
          "Plan Rows": 24,
          "Plan Width": 113,
          "Sort Key": ["tbl1.\"id\""],
          "Plans": [
            {
              "Node Type": "Hash Join",
              "Parent Relationship": "Outer",
              "Parallel Aware": false,
              "Join Type": "Right",
              "Startup Cost": 327.79,
              "Total Cost": 594.50,
              "Plan Rows": 24,
              "Plan Width": 113,
              "Inner Unique": true,
              "Hash Cond": "((tbl2.\"fk\")::text = (tbl1.\"id\")::text)",
              "Filter": "((lower((tbl1.\"id\")::text) ~~ '%query%'::text) OR (lower((tbl2.name)::text) ~~ '%query%'::text))",
              "Plans": [
                {
                  "Node Type": "Seq Scan",
                  "Parent Relationship": "Outer",
                  "Parallel Aware": false,
                  "Relation Name": "TBL2",
                  "Alias": "tbl2",
                  "Startup Cost": 0.00,
                  "Total Cost": 247.21,
                  "Plan Rows": 7421,
                  "Plan Width": 67
                },
                {
                  "Node Type": "Hash",
                  "Parent Relationship": "Inner",
                  "Parallel Aware": false,
                  "Startup Cost": 232.35,
                  "Total Cost": 232.35,
                  "Plan Rows": 7635,
                  "Plan Width": 97,
                  "Plans": [
                    {
                      "Node Type": "Seq Scan",
                      "Parent Relationship": "Outer",
                      "Parallel Aware": false,
                      "Relation Name": "TBL1",
                      "Alias": "tbl1",
                      "Startup Cost": 0.00,
                      "Total Cost": 232.35,
                      "Plan Rows": 7635,
                      "Plan Width": 97
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  }
]

My questions:

I have gin indexes on TBL1->name and TBL2->name and they work for single ILIKE queries I make on these tables. I also have TBL2->fk and TBL1->id indexes. But as you see above, it makes sequential scans on both tables, and I think this is because of the join?
How could I possibly make this query more efficient? Or should I be looking another ways to get this data?

The JSON execution plan is not really meant for humans. It's better to generate the plan using `explain (analyze, buffers, format text)` — , Aug 10 '22 at 12:40

score 0 · Answer 1 · answered Aug 10 '22 at 14:39

0

But as you see above, it makes sequential scans on both tables, and I think this is because of the join

More specifically, it is because of the OR which cross the JOIN boundary.

You should write two queries, one with the LIKE condition on each table in turn, and then combine with a suitable UNION. I didn't understand your comment to the horse, so I don't know exactly what would be suitable for you in this case.

However, none of this matters in your example, because %j% contains no usable trigrams.

answered Aug 10 '22 at 14:39

jjanes

37,812
5
27
34

`%j%` was an arbitrary query here. This is because it has a single letter, right? For my case I need to have an query on both table's name, as long as it matches either TBL1 or TBL2's name column, I want to show the result. And when I show a result, I always want to aggregate names from TBL2 if they exist. – Alp Aug 10 '22 at 14:49

Efficiently aggregating with joined table and searching

1 Answers1