CountMatches() equivalent Postgres query

Question

We know the stringutils function CountMatches() - CountMatches - counts the number of occurrences of one String in another. I need an equivalent query in postgresql to count the number of occurence of a string in another.

Any idea would appreciated

So, which of the string matching functions listed in the extensive official manuals did you consider and why didn't they meet your needs. — Richard Huxton, Aug 30 '14 at 06:39

score 1 · Answer 1 · answered Aug 30 '14 at 12:09

1

You can use regexp_matches with greedy switch:

select count ( * )
from regexp_matches ( 'abc abc', 'ab', 'g' ); -- result is 2

answered Aug 30 '14 at 12:09

Tomasz Siorek

701
5
10

score 1 · Answer 2 · edited May 23 '17 at 12:21

`regexp_matches()`

The solution with regexp_matches() that @Igor suggested and @Tomasz implemented is short and elegant, but has two downsides:

Regular expressions are powerful, but generally much slower than simple functions.
The aggregate step makes it non-trivial to integrate into bigger queries.

To use it in a query with multiple source rows:

SELECT t.*, count(match) AS ct
FROM   tbl t
LEFT   JOIN LATERAL regexp_matches(t.string, 'ab', 'g') match ON TRUE
GROUP  BY t.tbl_id
ORDER  BY t.tbl_id;

Or:

SELECT t.*, m.ct
FROM   tbl t
LEFT   JOIN LATERAL (
   SELECT count(*) AS ct
   FROM   regexp_matches(t.string, 'ab', 'g')
   ) m ON TRUE
ORDER  BY t.tbl_id;

`replace()` / `length()`

The expression is not as elegant, but should be faster and can be used on sets more easily:

SELECT (length(col) - length(replace(col, 'match', ''))) / length('match') AS ct;

You can wrap this in a simple SQL function:

CREATE OR REPLACE FUNCTION f_count_matches(_string text, _match text)
  RETURNS int LANGUAGE sql IMMUTABLE STRICT AS
$$
SELECT (length(_string) - length(replace(_string, _match, ''))) / length(_match)
$$;

Then:

SELECT f_count_matches('abc cab,xabx abab', 'ab');

SQL Fiddle demonstrating all.

I've tested your function on a large text (8630640 characters long 'Lorem ipsum [...]', counting 'Lorem'). It usually took about 200 ms. Version using regular expressions function took 45 sec. (There was no difference in performance on small input). Upvoting. Definitely. — Tomasz Siorek, Aug 30 '14 at 21:13
@TomaszSiorek: Thanks for the benchmark. Always good to see actual results. — Erwin Brandstetter, Aug 30 '14 at 21:24

score 0 · Answer 3 · answered Aug 30 '14 at 08:11

PostgreSQL does not have that function, but you can work around it this way:

SELECT array_length(regexp_split_to_array('axbxcxdxexfxg','b'),1)-1;  -- returns 1
SELECT array_length(regexp_split_to_array('axbxcxdxexfxg','x'),1)-1;  -- returns 6
SELECT array_length(regexp_split_to_array('axbxcxdxexfxg','z'),1)-1;  -- returns 0

There may be edge cases that do not work correctly.

regexp_split_to_array creates an array with the non-matching parts. The number of elements in the array is one more than the number of matches (at least for normal cases). So, array_length and -1 produce the count of matches.

CREATE FUNCTION num_matches(string text,pattern text)
RETURNS int AS $$
    SELECT array_length(regexp_split_to_array(string,pattern),1)-1;
$$ LANGUAGE SQL STABLE;

It may be easier to count the number of rows, outputted by `regexp_matches`. — Ihor Romanchenko, Aug 30 '14 at 11:08

CountMatches() equivalent Postgres query

3 Answers3

regexp_matches()

replace() / length()

`regexp_matches()`

`replace()` / `length()`