1

Suppose a PostgreSQL table, articles, contains two nullable String columns of name and alt_name. Now, I want to find records (rows) in the table that have

  • a combination of String name and alt_name matches another combination of the same type in the same table:
    • i.e., [a.name, a.alt_name] is equal to either [b.name, b.alt_name] or [b.alt_name, b.name]
  • where name or alt_name may be NULL or an empty String, and in any circumstances NULL and an empty String should be treated as identical;
    • e.g., when [a.name, a.alt_name] == ["abc", NULL], a record of [b.name, b.alt_name] == ["", "abc"] should match, because one of them is "abc" and the other is NULL or empty String.

Is there any neat query to achieve this?

I thought if there is a way to concatenate both columns with a UTF-8 replacement character (U+FFFD) in between, where NULL is converted into an empty String, that would solve the problem. Say, if the function were magic_fn(), the following would do a job, providing there is a unique column id:

SELECT * FROM articles a INNER JOIN places b ON a.id <> b.id
  WHERE
        magic_fn(a.name, a.alt_name) =  magic_fn(b.name, b.alt_name)
     OR magic_fn(a.name, a.alt_name) =  magic_fn(b.alt_name, b.name);


-- [EDIT] corrected from the original post, which was simply wrong.

However, concatnation is not a built-in function in PostgreSQL and I don't know how to do this.
[EDIT] As commented by @Serg and in answers, a string-concatnation function is now available in PostgreSQL from Ver.9.1 (CONCAT or ||); n.b., it actually accepts non-String input as long as one of them is a String-type as of Ver.15.

Or, maybe there is simply a better way?

Masa Sakano
  • 1,921
  • 20
  • 32
  • 2
    PosgreSql string concat oper is [||](https://www.postgresql.org/docs/current/functions-string.html) – Serg Nov 11 '22 at 13:43
  • 1
    [Concatenation **is** a built-in function](https://www.postgresql.org/docs/current/functions-string.html#id-1.5.8.10.5.2.2.1.1.1.1) - as specified in the SQL standard –  Nov 11 '22 at 14:49
  • @a_horse_with_no_name Yeah, you're right. I realised it when I saw people's answers. I have now edited the hypothetical code snippet so people will not be confused. – Masa Sakano Nov 11 '22 at 17:42

5 Answers5

1

try this

SELECT  *   FROM articles a
cross join articles b    
where  
(ARRAY[COALESCE(a.name,''),COALESCE(a.alt_name,'')] @>  ARRAY[COALESCE(b.name,''),COALESCE(b.alt_name,'')])  
and (ARRAY[COALESCE(a.name,''),COALESCE(a.alt_name,'')] <@  ARRAY[COALESCE(b.name,''),COALESCE(b.alt_name,'')]) 
and a.id<>b.id
and a.id<b.id  --optional (to avoid reverse matching) 

db<>fiddle

Mitko Keckaroski
  • 954
  • 1
  • 8
  • 12
  • That's neat! +1 for the option to avoid reverse matching (I didn't know the necessity, but I can guess). – Masa Sakano Nov 11 '22 at 15:08
  • Oops, I find your example fails when one of them is a combination of `''` (or NULL if coalesced). Basically, `(ARRAY['a', ''] @> ARRAY['', ''])` returns true. [db<>fiddle](https://dbfiddle.uk/r-3p4zbU) (based on your fiddle, thanks!) – Masa Sakano Nov 11 '22 at 16:22
  • 1
    @Masa Sakano, I did an update on the query. Hope this will work for you. – Mitko Keckaroski Nov 12 '22 at 16:36
  • Fantastic! Confirmed at [db<>fiddle](https://dbfiddle.uk/Y8fB_oRc) So, basically the reverse direction `@<` is required –– I couldn't work out why the original one didn't work. Thank you! – Masa Sakano Nov 12 '22 at 17:40
  • Greatly thanks to your answer, I have come up with another solution (see [my newly posted answer](https://stackoverflow.com/a/74415532/3577922), in which I acknowledge you), where [COALESCE()](https://www.postgresql.org/docs/current/functions-string.html) is nicely got rid of. – Masa Sakano Nov 12 '22 at 18:34
1

You can create a function which takes in the name and alt_name, then returns an aggregated string with nulls converted to empty strings and the results sorted:

create function magic_fn(a text, b text) returns text
  return (select json_agg(t.v) from (
    select t1.* from (
      select coalesce(a, '') v
      union all
      select coalesce(b, '') v) t1 
    order by t1.v) t);
create table articles (id int, name text, alt_name text);
insert into articles values (1, 'abc', null), (2, 'abc', ''), (3, null, 'abc'), (4, 'aaa', 'a'), (5, 'aaa', 'a'), (6, 'a', 'aaa')

Usage:

select * from articles a join articles b 
on a.id <> b.id and magic_fn(a.name, a.alt_name) = magic_fn(b.name, b.alt_name)

See fiddle

Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • Confirmed it works, even in edge cases with both columns being null/empty-string; see [db<>fiddle](https://dbfiddle.uk/x_zRIoZc), where I added two more rows on top of yours. Thanks! – Masa Sakano Nov 11 '22 at 18:04
  • 2
    If an above answer r0esolved you issue please accept it. This helps future questioners with same/similar issue and removes the question from the unanswered queue. Please do not leave successfully answered questions as unanswered. – Belayer Nov 12 '22 at 00:46
1

Having reviewed a few answers (special thanks to @MitkoKeckaroski), I have come up with this short solution. COALESCE() is not necessary!

The condition is that the UTF replacement character (\U+FFFD) should never appear in the data record, which you can safely assume according to the Unicode specification.

SELECT * FROM articles a JOIN articles b 
ON a.id <> b.id AND
  ARRAY[CONCAT(a.name, U&'\FFFD', a.alt_name), 
        CONCAT(a.alt_name, U&'\FFFD', a.name)] @>
  ARRAY[CONCAT(b.name, U&'\FFFD', b.alt_name)];

See db<>fiddle (where I extended the data prepared by @Ajax1234 – thank you!)

Masa Sakano
  • 1,921
  • 20
  • 32
0

you can try to use

  • coalesce for convert null to empty
  • || for concatenate string

and then compare string like this sql:

(coalesce(a.name,'') || coalesce(a.altname,'')) =  (coalesce(b.name,'') || coalesce(b.altname,'')) 
 or 
 (coalesce(a.name,'') || coalesce(a.altname,'')) =  (coalesce(b.altname,'') || coalesce(b.name,'')) 
MMAARR
  • 93
  • 4
  • Now I understand `CONCAT()` is long [introduced to Postgres](https://www.postgresqltutorial.com/postgresql-string-functions/postgresql-concat-function/) since Ver.9.1, thanks. I am afraid your answer wrongly works when the 2-column combination is like `('ab-cd', '-ef')` and `('ab-', 'cd-ef')`. – Masa Sakano Nov 11 '22 at 15:13
0

You can create an array from both names, remove null and empty values, then check if the arrays overlap (have elements in common)

select *
from articles
where array_remove(array[nullif(name,''), nullif(alt_name,'')], null) && array['abc']

This can be made easier by creating a function that generates such an array:

create or replace function combine_names(p_names variadic text[]) 
  returns text[]
as
$$
  select array_agg(name)
  from unnest(p_names) as x(name)
  where nullif(trim(name),'') is not null;
$$ 
language sql
immutable
called on null input;  

By making the parameter variadic it's possible to provide a different number of arguments (in theory even more than two)

select *
from articles
where combine_names(name, alt_name) && combine_names('abc')


select *
from articles
where combine_names(name, alt_name) && combine_names('abc', null)


select *
from articles
where combine_names(name, alt_name) && combine_names('abc', 'def')
  • A caveat is it seems not to work in edge cases where both the columns are null/empty-string. Other than that, it works, thanks! See [db<>fiddle](https://dbfiddle.uk/4bv8Nqyr), where I implemented your algorithm to my question case, including the edge cases. – Masa Sakano Nov 11 '22 at 18:06