0

I have a String containing a comma-separated list of terms, like this:

term1, term2* ,term3*,term 4 , *term5, *term 6 , term 7*,* term 8

Each term may have leading or trailing whitespaces. These should be ignored. Each term may have whitespaces inside it. I want to find all terms not starting or ending in an asterisk. In the list above, that would be "term1" and "term 4".

My failed attempts only led to me finding every term (just dropping the asterisks rather than ignoring the term) like in this example: https://regex101.com/r/9QjjJ5/1.

I've also tried achieving this with lookahead expressions and borders, but must be using them wrongly, as the found term is then just shortened or excluding spaces inside the term.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Wollo
  • 60
  • 1
  • 8
  • Without any more details, it is not possible to help you in the best way. Try [something like this](https://regex101.com/r/t9AFou/1). – Wiktor Stribiżew Nov 27 '17 at 11:16
  • Thanks Wiktor. Unfortunately, that does not work, since it includes the separating character in the expression. Therefore, it fails if two consecutive terms match, e.g. "term1, term2, term3" will match only term1 and term3. What information is missing from my question? – Wollo Nov 27 '17 at 11:20
  • What is the software you are working with? – Wiktor Stribiżew Nov 27 '17 at 11:29
  • This is for an oracle query - I'm using the regexp_count() function. Thus, the regexp would have to be POSIX-ERE (as tagged). – Wollo Nov 27 '17 at 11:31
  • I am sure there is a way to 1) split with `,` 2) trim all chunks, 3) filter out all those that start/end with `*`. I added the `oracle` tag to the post. – Wiktor Stribiżew Nov 27 '17 at 11:34
  • I don't think you can pull this off in regex alone without lookaheads. – tripleee Nov 27 '17 at 11:38
  • I do not know about the way to split and strip in the specific language, but the following regex should find the matching chunks. `^[^*].*[^*]$` .. – Uvar Nov 27 '17 at 12:13

1 Answers1

1

You can use the conventional method in Oracle to split the string and then use REGEXP_LIKE to filter.

WITH tab ( terms ) AS (
    SELECT
        'term1, term2* ,term3*,term 4 , *term5, *term 6 , term 7*,* term 8'
    FROM
        dual
) SELECT * FROM
(
SELECT DISTINCT
    TRIM(regexp_substr(terms,'[^,]+',1,level) ) term 
  FROM
    tab
CONNECT BY
    regexp_substr(terms,'[^,]+',1,level) IS NOT NULL
    ) WHERE NOT REGEXP_LIKE (term,'^\*|\*$');
Kaushik Nayak
  • 30,772
  • 5
  • 32
  • 45
  • You are right. After further investigation, it really looks like there is no way to do it with just a regex. You need lookahead capability for that, and Oracle just doesn't have that... – Wollo Nov 27 '17 at 14:38