1

I have emails table that has sender and reporter columns. I want to search given parameter in those columns and return unique values.

Let me explain with sample. This is my table and records:

CREATE TABLE public.emails (
  id                bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
    (MAXVALUE 9223372036854775807),
  sender            jsonb NOT NULL,
  reporter      jsonb not null
);

insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis2@example.com", "name": "dennis1"}]', '[{"email": "john@example.com", "name": "john"}, {"email": "dennis1@example.com", "name": "dennis1"}, {"email": "dennis2@example.com", "name": "dennis2"}]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');

I want to get email addresses and names. I also want to avoid dupes. Only one email and one name. I also don't want to get it as array, instead, one email and name per one row.

Searching john

SELECT
* /* i don't know what to put here pr merge with reporters */
FROM "emails" AS "e"
WHERE (EXISTS (SELECT
    *
  FROM JSONB_ARRAY_ELEMENTS_TEXT("e"."sender") AS "e" ("email")
  WHERE ("e"."email" ~* 'john' or "e"."name" ~* 'john'))
);

Expected Result for john:

email                 name
john@example.com      john

Searching for `` (empty):

SELECT
* /* i don't know what to put here pr merge with reporters */
FROM "emails" AS "e"
WHERE (EXISTS (SELECT
    *
  FROM JSONB_ARRAY_ELEMENTS_TEXT("e"."sender") AS "e" ("email")
  WHERE ("e"."email" ~* '' or "e"."name" ~* ''))
);

Expected Result for `` (empty):

email                 name
john@example.com      john
dennis1@example.com   dennis1
dennis2@example.com   dennis2

dennis2 is in both sender and reporter, therefore, only one of them is required. No dupes.

In fact, in here there is one catch. If sender or reporter columns has at least one json object (not json array) then this query also fails.

Error: cannot extract elements from an object

Which is an another story, tho.

How can i achieve my goal in this case?

Demo: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=1bf9c5f83f5104e2392c31984cb4e939

Dennis
  • 1,805
  • 3
  • 22
  • 41
  • https://dbfiddle.uk/?rdbms=postgres_12&fiddle=6173b070fc4b377a67fc1db017f11e63 It is not clear how to deal with pairs `dennis1 - dennis2@example.com`, `dennis2 - dennis2@example.com` and `dennis1 - dennis1@example.com`: pair `dennis1 - dennis2@example.com` duplicates what? – Abelisto May 10 '20 at 05:30
  • Ah, name and email can be anything. This is why im searching on both fields using `OR`. We need to check dupes based on `email`. If email present more than 1 time, regardless of its name, only first or last or randomly selected. So one only. I forgot to say that. So Sorry. Can we pick last or first match? – Dennis May 10 '20 at 05:41
  • Is is what you mean https://dbfiddle.uk/?rdbms=postgres_12&fiddle=43b1cd633d014bb180dedf4bf016b2ab ? – Abelisto May 10 '20 at 05:49
  • Exactly! You used `||` operator in function to merge columns. I didn't know that. Wow. Can you post it as an answer? – Dennis May 10 '20 at 05:56
  • P.S.: If `senders` or `reporters` is `NULL` it fails :( See: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=aaaf68cb11a7b931634be1f57b371530 – Dennis May 10 '20 at 06:09
  • Yes, it fails because of `not null` constraint. What is your question? – Abelisto May 10 '20 at 06:11
  • @Abelisto Sorry man! I'm sleepless at the moment i didn't get attention to that. Sorry :/ – Dennis May 10 '20 at 06:14

1 Answers1

2

Normalize your data before searching, then remove duplicates using distinct on () clause:

with cte as (select x ->> 'name' as name, x ->> 'email' as email
from emails as e, jsonb_array_elements(e.sender || e.reporter) as x)
select distinct on (email) * from cte where 
  name ~* '' or email ~* ''
  --name ~* 'john' or email ~* 'john'
order by email;

Demo

Note that it will always scan the whole table, no indexes applicable in this case. Think about schema normalization.

Abelisto
  • 14,826
  • 2
  • 33
  • 41
  • Thanks for the quick help! I already did that (normalization) the question contains complete draft queries and table schema. I will take care of it. – Dennis May 10 '20 at 06:16