2

So, I have this scenario in my php code where I have the following string

This is an outside Example <p href="https://example.com"> This is a para Example</p><markup class="m"> this is a markup example</markup>

And I want to do a case-insensitive search for the word example in this string, but

  • I want my regex to ignore occurrence of example inside a tag attribute (which I am able to achieve)
  • I want to ignore search inside the following <markup ..> any content </markup> entirely

What I have done till now is,

/(example)(?:[^<]*>)/i

This works fine and ignores the example within href of p tag, now I have modified it for the <markup>

/(example)(?!([^<]*>)|(\<markup[^>]*>[^<]*<\/markup\>))/i

but this isn't working. you can see my work - https://regex101.com/r/e2XujN/1

What I want to achieve with this

I will be replacing the matched example word, in the following way

  • Suppose if i found eXamPle it will be replace by <markup>eXamPle</markup>
  • Example will be replace by <markup>Example</markup>

and so on,

Note: Case of the pattern in the matched string and replace string is same

  • Can you give me an example of how this can be achieved using DOM @ php. – Karthik Thayyil Nov 05 '17 at 07:02
  • Are you trying to replace matched sub-string with sth or it's just a matter of knowing number of occurrences? – revo Nov 05 '17 at 07:06
  • I want to do a case-insensitive search for the word `example` and i'll be replacing the matched `eXample` with `eXample`. note: case in replace string is same as matched string. – Karthik Thayyil Nov 05 '17 at 07:12

3 Answers3

1

You can use (*SKIP)(*F) predicated in PCRE to match and skip certain substrings enclosed by a pattern/string (here markup) like this:

(markup).*\1(*SKIP)(*F)|(example)(?![^<]*>)

Explanation:

Excluded Substring: 1st Capturing Group
(markup): matches the characters markup literally (case insensitive)
.* matches any character (except for line terminators)
\1 matches the same text as the 1st capturing group
(*SKIP) over
(*F) shorthand for (*FAIL), do not match

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • Thanks for the answer. Got to learning something new. Really saved my day – Karthik Thayyil Nov 05 '17 at 16:49
  • Welcome. In hindsight, I would use a more closed & lazy pattern like this: [`(*SKIP)(*F)|(example)(?![^<]*>)`](https://regex101.com/r/y1j5t6/2) Maybe just an edge case but if you have a long test sitting on a single line it would matter. – wp78de Nov 05 '17 at 19:45
1

You can solve it the same way you did with the first problem. Check if the string is not directly followed by a closing tag.

Regex:

(example)(?![^<]*>)(?![^<]*<\/markup\>)

Demo

Nandee
  • 598
  • 6
  • 11
0

The answer is using DOM, however it's a little tricky to work with text nodes and inserting HTML content into them.

PHP live demo

$content = <<< 'HTML'
This is an outside Example <p href="https://example.com"> This is a para Example</p>
test <markup class="m"> this is a markup example</markup> another example <p>example</p>
HTML;

// Initialize a DOM object
$dom = new DOMDocument();
// Use an HTML element tag as our HTML container
// @hakre [https://stackoverflow.com/a/29499718/1020526]
@$dom->loadHTML("<div>$content</div>");

$wrapper = $dom->getElementsByTagName('div')->item(0);
// Remove wrapper
$wrapper = $wrapper->parentNode->removeChild($wrapper);
// Remove all nodes of $dom object
while ($dom->firstChild) {
    $dom->removeChild($dom->firstChild);
}
// Append all $wrapper object nodes to $dom
while ($wrapper->firstChild) {
    $dom->appendChild($wrapper->firstChild);
}

$dox = new DOMXPath($dom);
// Query all elements in addition to text nodes
$query = $dox->query('/* | /text()');

// Iterate through all nodes
foreach ($query as $node) {
    // If it's not an HTML element
    if ($node->nodeType != XML_ELEMENT_NODE) {
        // Replace desired word / content
        $newContent = preg_replace('~(example)~i',
            '<markup>$1</markup>',
            $node->wholeText);
        // We can't insert HTML directly into our node
        // so we need to create a document fragment
        $newNode = $dom->createDocumentFragment();
        $newNode->appendXML($newContent);
        // Replace new content with old one
        $node->parentNode->replaceChild($newNode, $node);
    }
}

// Save modifications
echo $dom->saveHTML($dom);
revo
  • 47,783
  • 14
  • 74
  • 117
  • Really appreciate your effort. but this also skips `example` inside

    tag. it should only be skipping the example inside tag. So the expected output for your example should be `This is an outside Example

    This is a para Example

    test this is a markup example another example

    example

    `

    – Karthik Thayyil Nov 05 '17 at 09:49