I am working on a custom CMS parser application using C# and need to match tags and the content between those tags from various content snippets from string values submitted by the client. The tags are dynamic and so is the content. The requirements for this project is that it has to be in C# native and cannot use third party libraries like HTML Agility Pack.
I have been working with this as an example: https://regex101.com/r/e7twfZ/1
(?=(<picture>))(\w|\W)*(?<=<\/picture>)
...searching the string...
<!DOCTYPE html>
<html lang="en">
<head>
<title>Title</title>
</head>
<body>
<picture>
<source srcset="mobile.png" ></source>
<source srcset="tablet.png" ></source>
<source srcset="desktop.png" ></source>
<img srcset="default.png">
</picture>
</body>
</html>
However, I need to match pretty much any alpha numeric between an opening and closing caret. When I change the RegEx to:
(?=(<picture>))(\w|\W)*(?<=<\/picture>)
I lose my match.
My goal is to end up with:
new Regex(@"(?=(<picture>))(\w|\W)*(?<=<\/picture>)").Match(@"<!DOCTYPE html>
<html lang='en'>
<head>
<title>Title</title>
</head>
<body>
<picture>
<source srcset='mobile.png' ></source>
<source srcset='tablet.png' ></source>
<source srcset='desktop.png' ></source>
<img srcset='default.png'>
</picture>
</body>
</html>");
However, I am still not entirely sure how to do a proper MatchCollection
in C#.
Also, this is my first time posting on StackOverflow.com. I have researched fairly thoroughly but decided to ask a question since each answer seemed a little different than from what I am looking to accomplish. Thank you for your help. Feel free to offer any suggestions!