5

I have string with HTML images, for example:

string str = "There is some nice <img alt='img1' src='img/img1.png' /> images in this <img alt='img2' src='img/img2.png' /> string. I would like to ask you <img alt='img3' src='img/img3.png' /> how Can I can I get the Lenght of the string?";

I would like to get the lenght of the string without the images and the count of images. So, the result should be:

int strLenght = 111;
int imagesCount= 3;

Can you show me the most effective way, please?

Thanks

Alex K.
  • 171,639
  • 30
  • 264
  • 288
Lubos Marek
  • 178
  • 1
  • 3
  • 13
  • You can do this with the help of RegularExpression. Please let me know if you need solution based on it – K D Apr 29 '16 at 11:11
  • Take a look to this answer to remove HTML tags: http://stackoverflow.com/a/18154046/5119765 Then you'll be able to get the string length. – ADreNaLiNe-DJ Apr 29 '16 at 11:14
  • 1
    Your best option would be to use a html parser like [Html Agility Pack](https://htmlagilitypack.codeplex.com/) so you can properly count the character length of the content and the number of image tags. – juharr Apr 29 '16 at 11:18

5 Answers5

4

I'd suggest to use a real HTML parser, for example HtmlAgilityPack. Then it's simple:

string html = "There is some nice <img alt='img1' src='img/img1.png' /> images in this <img alt='img2' src='img/img2.png' /> string. I would like to ask you <img alt='img3' src='img/img3.png' /> how Can I can I get the Lenght of the string?";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
int length = doc.DocumentNode.InnerText.Length;               // 114
int imageCount = doc.DocumentNode.Descendants("img").Count(); // 3

This is what DocumentNode.InnerText returns in your sample, you've skipped some spaces:

There is some nice  images in this  string. I would like to ask you  how Can I can I get the Lenght of the string?
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
2

I had a similar problem and I've created this method. You can use it to strip HTML tags and count your string

public static string StripHtmlTags(string source)
{
  if (string.IsNullOrEmpty(source))
  {
    return string.Empty;
  }

  var array = new char[source.Length];
  int arrayIndex = 0;
  bool inside = false;
  for (int i = 0; i < source.Length; i++)
  {
    char let = source[i];
    if (let == '<')
    {
      inside = true;
      continue;
    }

    if (let == '>')
    {
      inside = false;
      continue;
    }

    if (!inside)
    {
      array[arrayIndex] = let;
      arrayIndex++;
    }
  }

  return new string(array, 0, arrayIndex);
}

your counting would be like:

int strLength = StripHtmlTags(str).Count;
Fred Smith
  • 2,047
  • 3
  • 25
  • 36
  • You know you could just do `foreach(char let in source)` instead since `string` implements `IEnumerable`. – juharr Apr 29 '16 at 11:30
2

Add a (COM) reference to MSHTML (Microsoft HTML object lib) and you can:

var doc = (IHTMLDocument2)new HTMLDocument();
doc.write(str);

Console.WriteLine("Length: {0}", doc.body.innerText.Length);
Console.WriteLine("Images: {0}", doc.images.length);
Alex K.
  • 171,639
  • 30
  • 264
  • 288
1

If you would like to do it with the help of RegularExpression as i mentioned in my comment above. Please use following code

var regex = new System.Text.RegularExpressions.Regex("<img[^>]*/>");
var plainString = regex.Replace(str, ""); 

// plainString.length will be string length without images
    var cnt = regex.Matches(str).Count; // cnt will be number of images
K D
  • 5,889
  • 1
  • 23
  • 35
0

I liked John Smith solution, however I had to add Trim() at the end to match the MS Word result.

Use this:

return new string(array, 0, arrayIndex).Trim();
Ali Bdeir
  • 4,151
  • 10
  • 57
  • 117
Andre RB
  • 306
  • 4
  • 7