0

I am displaying an incoming e-mail in a WebBrowser control. If the email is in HTML, links are clickable and users are able to navigate to the URL quickly in their default browser. If the email is in Plain Text, however, I'm simply setting the WebBrowser's InnerText equal to the text of the email.

This leaves me with URLs that do not have anchor tags, and users have to copy and paste the URL into their browsers.

My first instinct was just to set the InnerHTML to the email text, use a regex to find any URLs and replace the matches with the same thing but with anchor tags.

This presented the problem of removing all line breaks, so I just replaced those with the appropriate tag.

public static string CheckPlainTextLinks(string html)
{
  Regex regx = new Regex(@"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)", RegexOptions.IgnoreCase);
  MatchCollection mactches = regx.Matches(html);

  foreach (Match match in mactches)
  {
    html = html.Replace(match.Value, "<a href='" + match.Value + "'>" + match.Value + "</a>");
  }

  html = html.Replace(Environment.NewLine, "<br />");

  return html;
}

This is the entirety of my function to scan the text and add links. I then set the InnerHTML of my webBrowser control to what is returned by this function. Unfortunately, the program started getting OutOfMemory Exceptions that went away when my call to this function was taken out.

I looked into using mshtml to create the links instead of changing the html directly with help from these posts, http://social.msdn.microsoft.com/Forums/da-DK/csharpgeneral/thread/1d050260-3625-42cc-94ec-59bba0651a1c. I'm just not sure how to create the IHTMLTxtRange on each of the regex matches.

Is there a better way to create these links or a solution to the out of memory exception?

Alex
  • 68
  • 1
  • 11
  • How big is the plain text? In particular, when you are getting the out of memory exception? – dash Jan 30 '13 at 22:41
  • Seems he took full page as a string. – Ken Kin Jan 30 '13 at 22:46
  • The emails do include all previous responses, so it can get rather large. I'd say a pretty typical chain ends at around 10,000 characters. – Alex Jan 30 '13 at 22:46
  • I suspected they were long strings; for each match and replace, a copy of the string is created; see http://blogs.msdn.com/b/ravi_kumar/archive/2008/03/22/out-of-memory-issues-to-watch-out-for-when-using-regular-expressions.aspx for more info. Additionally, you might have more luck with simply scanning through the text, looking for URL's manually. See @JeffAtwood's post here (and the first comment) http://www.codinghorror.com/blog/2008/10/the-problem-with-urls.html – dash Jan 30 '13 at 22:52

1 Answers1

0

Would this work better for you?

public static string CheckPlainTextLinks(string html) {
    var regx=new Regex(@"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)", RegexOptions.IgnoreCase);
    return regx.Replace(html, x => "<a href='"+x.Value+"'>"+x.Value+"</a>").Replace(Environment.NewLine, "<br />");
}

Instead of repeatly replace in a foreach loop, this works with MatchEvaluator.

Ken Kin
  • 4,503
  • 3
  • 38
  • 76
  • I put this in and have been testing to make sure I didn't get any out of memory exceptions for the last couple of weeks. Works perfectly, thank you! – Alex Feb 15 '13 at 15:49