0

I'm having some problems understanding the regex pattern syntax. I'm using Outlook interop to go through the HTMLbody of an email.msg.

I want to remove all the images that has a reference to the internet. So I'm useing Regex.Replace to find all image tags and replacing them with text.

This is what, I've:

string altText = " <i>*Reference to picture on the internet removed*</i> "; string b = Regex.Replace(a, @"(<img([^>]+)>)", altText);

This works, but I want to find the tags that only have src from the internet. I found this in my google search:

string matchString = Regex.Match(a, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;

But it will not help since it looks like all images have a src tag. My goal is to write a pattern syntax if possible in Regex where i check if the source ( src ) starts with http, https or www.

Is there anyone who can help me with this?

FabioBranch
  • 175
  • 4
  • 19
Andreas
  • 27
  • 6

1 Answers1

1

I would suggest to use an HTML parser in order to find your images tag rather than a regex directly. You can then use a Regex in order to check the src attribute if required.

In the meantime, I believe the following regex will produce the results you are expecting:

<img.+?src=[\"']((?:https?|www).*)[\"'].*?>

Regex Cases: Regex

Edit It is to be noted as well that sometimes links can just start by //. The following regex should do it:

<img.+?src=[\"']((?:https?|www|//).*)[\"'].*?>

For a more extensive Regex solution matching URL, please see What is a good regular expression to match a URL?

Community
  • 1
  • 1
StfBln
  • 1,137
  • 6
  • 11