1

Here is the example I have:

JELLY2some text

some text{JELLY2}some textsome textsome text
Sample text for testing:
some textJELLY2 {some text JELLY2 lsdkfjsd}にsome text

I want to get all JELLY2 except those in parentheses like:

{JELLY2}

and

{some text JELLY2 lsdkfjsd}

http://regexr.com/3dhsl

I need to get data by select statement, something like:

SELECT `id` FROM `table` WHERE `body` REGEXP 'JELLY2'

Or maybe, if it's possible with RLIKE or some other way?

SELECT `id` FROM `table` WHERE `body` RLIKE 'JELLY2'
whitesiroi
  • 2,725
  • 4
  • 30
  • 64
  • 1
    What are you going to do with `JELLY2` outside braces? Replace? Remove? What is the language? JS? – Wiktor Stribiżew Jun 02 '16 at 08:29
  • @Wiktor Stribiżew mysql regexp – whitesiroi Jun 02 '16 at 08:30
  • 1
    I don't think this is possible because you'd need to use assertions, which aren't supported in MySQL as it uses POSIX regular expressions. Can you select all the results containing `JELLY2` and then use another language to filter out the ones where it does not appear outside the braces? Even then it's not trivial because you could have multiple embedded braces where the keyword appears inside one but not the other {like {this} JELLY2}. – Matt Raines Jun 02 '16 at 08:39
  • @Matt Raines thank you for your comment. It shouldn't go as deep as one level of {}. But, you probably right, maybe, it's not possible by MySQL only. Thank you. – whitesiroi Jun 02 '16 at 08:46

2 Answers2

2

Use a negated character class and groups that can skip over bracketed input, to require that the target is not with brackets, plus start/end anchors:

SELECT id
FROM table
WHERE body RLIKE '^([^{]*(\{[^}]*\})?)*JELLY2([^{]*(\{[^}]*\})?)*$'

See live demo.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Thank you very much, is it possible to get this one too ('some textJELLY2 {some text JELLY2 lsdkfjsd}にsome text'); If JELLY2 is in {} and out in a record get its id also. is it possible? – whitesiroi Jun 02 '16 at 10:00
  • This doesn't work for any string which contains braces where `JELLY2` is outside the braces. For example, `{example} JELLY2`. – Matt Raines Jun 02 '16 at 10:04
  • @whitesiroi Yes. The edited answer matches your example. – Bohemian Jun 02 '16 at 10:28
  • Yes, that's better. Now it incorrectly matches `{{example} JELLY2 {example}}`, but the OP has said this situation is unlikely to arise. – Matt Raines Jun 02 '16 at 10:30
  • @matt regex can't handle recursive structures. For that you need a full language parser with AST etc. Regex is limited to simple pattern matching, but as you say such edge cases can be defined as out of scope. – Bohemian Jun 02 '16 at 10:35
1

Although it's unlikely you could do this with a MySQL regular expression, and I think it would be better approached using another language, you could write a MySQL function to do this.

There's probably any number of approaches. This one loops over all the occurrences of needle in haystack, checking whether the number of { characters preceding the occurrence is less than or the same as the number of } characters preceding it. Which means it's not particularly robust (}}}{JELLY2} would confuse it) but you could develop the idea further if this was an issue.

DELIMITER //
CREATE FUNCTION contains_text_outside_braces(needle VARCHAR(255), haystack VARCHAR(255))
RETURNS INT DETERMINISTIC
BEGIN
    DECLARE pos INT;
    DECLARE previous INT DEFAULT 1;
    DECLARE length INT DEFAULT LENGTH(needle);
    DECLARE prefix VARCHAR(255);
    LOOP
        SET pos = LOCATE(needle, haystack, previous);
        IF pos = 0 THEN
            RETURN 0;
        END IF;
        SET prefix = LEFT(haystack, pos - 1);
        IF LENGTH(REPLACE(prefix, '{', '')) >= LENGTH(REPLACE(prefix, '}', '')) THEN
            RETURN 1;
        END IF;
        SET previous = pos + length;
    END LOOP;
END//

DELIMITER ;

SELECT * FROM example_table WHERE contains_text_outside_braces('JELLY2', content);
Matt Raines
  • 4,149
  • 8
  • 31
  • 34