• @Amar Palsapure,I have to remove these html tags from my html file,I do not need this informaion.I need regex for removing – Waseem Fastian Feb 06 '12 at 10:52
  • I have done it myself, this is the regex that I have used @"]*)?>".Thanks to my mighty Allah.Also thanks to who give me feedback. – Waseem Fastian Feb 06 '12 at 11:11
  • 1 Answers1

    0

    You don't need to use RegEx to simply strip the HTML tags. The following method iterates through the HTML code string and creates a new return string without any tags.
    This way is faster than RegEx too.

    public static string StripHTMLTags(string str)
        {
            char[] array = new char[str.Length];
            int arrayIndex = 0;
            bool inside = false;
    
            for (int i = 0; i < str.Length; i++)
            {
                char c = str[i];
                if (c == '<')
                {
                    inside = true;
                    continue;
                }
                if (c == '>')
                {
                    inside = false;
                    continue;
                }
                if (!inside)
                {
                    array[arrayIndex] = c;
                    arrayIndex++;
                }
            }
            return new string(array, 0, arrayIndex);
        }
    
    QQping
    • 1,370
    • 1
    • 13
    • 26
    • This will fail if you have `>` inside HTML comment. And the question is not about removing *all* tags, only some specific ones. – svick Feb 06 '12 at 11:45
    • @QQping,@Svick,I have done it myself, this is the regex that I have used @"]*)?>".Thanks to my mighty Allah.Also thanks to who give me feedback. – Waseem Fastian Feb 06 '12 at 13:00