1

From a full of html codes, i want a specific html tag which contains a specific word.

<textarea>asdasdasdasd as</textarea>
<textarea>asdacccda 
sdas</textarea>
<textarea>asdasdasdasd as</textarea>

This is returns content between first textarea and last /textarea tag but desired result is at the middle.

\<textarea\>(.*)[ccc](.*)\<\/textarea\>/s

wrong result

Expected result;

<textarea>asdacccda 
sdas</textarea>

I've tried a couple things more but I couldn't make it work as multiline. enter image description here How can I achieve that?

siniradam
  • 2,727
  • 26
  • 37

2 Answers2

3

You have different possibilities here.

The regex version

<textarea>                  # match <textarea>
(?:(?!</textarea>)[\s\S])*? # match anything but stop before </textarea>
ccc                         # the word you want
(?:(?!</textarea>)[\s\S])*? # same construct as above
</textarea>                 # match </textarea>

This uses a technique called the tempered greedy token, see a demo on regex101.com.


Xpath queries

Another one would be to use xpath queries, ie:

//textarea[contains(., 'ccc')]

Afterwards, do whatever you want with the elements (ie remove them from the DOM).


Hints

Your original query with [ccc] will certainly not do what you expect it to - it is a character class which is redundant in this case (c will do the same).

Jan
  • 42,290
  • 8
  • 54
  • 79
2

Here is a working regex:

<textarea>((?:(?!<\/textarea>).)*?)ccc(.*?)<\/textarea>

Yes, this does seem rather uneccessary but that goes back to why using regex for HTML content is not the best idea. Here is the breakdown:

<textarea>((?:(?!<\/textarea>).)*?)ccc(.*?)<\/textarea>
<textarea>  -- literal match of text
          (                       )  -- your original capturing group
           (?:(?!<\/textarea>).)  -- this is a bit tricky but the idea is that you dont want it to match the textarea as part of the group
                                 ? make this token non greedy
                                   ccc   -- literal match of 3 c's, dont use square brackets, thats for doing a "one of the things in these brackets" match
                                      ( .. . . . . . >     -- this can stay the same

If you want to see it on regex101, see here

R Nar
  • 5,465
  • 1
  • 16
  • 32
  • does not returns expected result, you can test at https://regex101.com, thank you tough. – siniradam Jun 24 '16 at 13:32
  • @siniradam, are you sure? the link that I posted seems to be working fine with your input and output. Unless you didn't actually want to capture the text before and after the `ccc`, in which case the capturing groups can be removed. – R Nar Jun 24 '16 at 13:35
  • Ah, you need the `s` flag to ensure the `.` token also matches new lines. – R Nar Jun 24 '16 at 14:52