0

I'm making a class for parsing HTML and one thing I need to make sure of is that I'm not parsing anything inside of either single or double quotation marks. What I've done is make an enumerator

enum quotation {OUTSIDE, INSIDE_SINGLE, INSIDE_DOUBLE};

inside the class and then use the following sort of pattern

void HtmlParser::toNextBracket()
{
/*
    Adanvances _curIter to the pointer to the next angled bracket in 
    the range (_curIter, _offend]. If there is no angled bracket in 
    that range, then _curIter will equal _offend when the function 
    exits.
*/
    while (_curIter != _offend)
    {
        ++_curIter;
        const char thisChar = getCurChar();
        if (thisChar == '<' || thisChar == '>')
        {
            if (_quoteStatus == OUTSIDE) break;
        }
        else if (thisChar == '\'')
        {
            if (_quoteStatus == INSIDE_SINGLE) _quoteStatus = OUTSIDE;
            else _quoteStatus = INSIDE_SINGLE;

        }
        else if (thisChar == '"')
        {
            if (_quoteStatus == INSIDE_DOUBLE) _quoteStatus = OUTSIDE;
            else _quoteStatus = INSIDE_DOUBLE;
        }
    }
} 

But I feel like there must be a better way of doing this. Which C++ tools should I be using for a more elegant procedure?

2 Answers2

3

I suspect that you need to reevaluate your initial requirements for parsing HTML. Quotes/apostrophes play no roles in parsing of HTML <tag>s:

Try opening the following file in your browser:

<html>
<body>
<p>Quotation "<i>mark</i>."</p>
</body>
</html> 

Your intent is, apparently, to skip the HTML tags inside the quotes; however, as this example shows, the HTML tags inside quotes are as valid as they are outside.

In order to parse HTML tags you do not need concern yourself with quotes or apostrophes.

dyp
  • 38,334
  • 13
  • 112
  • 177
Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
0

in one word: yes.

in 2 words: stl containers (http://www.cplusplus.com/reference/stl/)

your situation requires a map - a special container that can use chars (in your case) as keys (just like int index for array) and a function delegation (again - for this particular case) as it's value. this is why this pair is known as a key-value pair.

here is a great code sample which describe what you need: Using a STL map of function pointers

the upside: after declaring the map you achive performance of o(1). however, the structure of if..else stetement could be less efficient (say.. when the current char is placed in the last 'if' statement on the block)

just make sure to store the object of the map in a static context (or anywhere else, as long as you build it only once)

Community
  • 1
  • 1
ymz
  • 6,602
  • 1
  • 20
  • 39