4

I would like to parse out any HTML data that is returned wrapped in CDATA.

As an example <![CDATA[<table><tr><td>Approved</td></tr></table>]]>

Thanks!

  • 1
    Can you be more specific? You've got an XML document, containing a CDATA section, and you want to get a string containing the contents of that CDATA section? – Tim Robinson May 01 '09 at 17:18
  • I am getting this returned in a DataTable as one of the columns in the result set as a string exactly as per the example I wrote above, so I just want to do a regex to get the contents and return to browser just the html string via an AJAX call. – Little Larry Sellers May 01 '09 at 17:21

6 Answers6

8

The expression to handle your example would be

\<\!\[CDATA\[(?<text>[^\]]*)\]\]\>

Where the group "text" will contain your HTML.

The C# code you need is:

using System.Text.RegularExpressions;
RegexOptions   options = RegexOptions.None;
Regex          regex = new Regex(@"\<\!\[CDATA\[(?<text>[^\]]*)\]\]\>", options);
string         input = @"<![CDATA[<table><tr><td>Approved</td></tr></table>]]>";

// Check for match
bool   isMatch = regex.IsMatch(input);
if( isMatch )
  Match   match = regex.Match(input);
  string   HTMLtext = match.Groups["text"].Value;
end if

The "input" variable is in there just to use the sample input you provided

Ron Harlev
  • 16,227
  • 24
  • 89
  • 132
  • it's probably more suitable to use .* instead of [^\]]* for the text group otherwise any HTML with the "]" in it will prevent the match. – TheXenocide Aug 05 '11 at 15:54
4

I know this might seem incredibly simple, but have you tried string.Replace()?

string x = "<![CDATA[<table><tr><td>Approved</td></tr></table>]]>";
string y = x.Replace("<![CDATA[", string.Empty).Replace("]]>", string.Empty);

There are probably more efficient ways to handle this, but it might be that you want something that easy...

Luke Woodward
  • 63,336
  • 16
  • 89
  • 104
Scott Arrington
  • 12,325
  • 3
  • 42
  • 54
2

Not much detail, but a very simple regex should match it if there isn't complexity that you didn't describe:

/<!\[CDATA\[(.*?)\]\]>/
Chad Birch
  • 73,098
  • 23
  • 151
  • 149
1

The regex to find CDATA sections would be:

(?:<!\[CDATA\[)(.*?)(?:\]\]>)
Tomalak
  • 332,285
  • 67
  • 532
  • 628
0

Why do you want to use Regex for such a simple task? Try this one:

str = str.Trim().Substring(9);
str = str.Substring(0, str.Length-3);
Adren
  • 21
  • 1
0
Regex r = new Regex("(?<=<!\[CDATA\[).*?(?=\]\])");
patjbs
  • 4,522
  • 3
  • 23
  • 18