If I understood your question right, it seems like you are dealing with tags out of order, and then you want to order them correctly. Doing so will not be easy at all...
Let me try to construct a system which is doing precisely that and then you will see all the complexity that is hiding behind.
Each tag would have to be represented by a key. Tags are either opening (e.g. <html>
), closing (e.g. </html>
), or empty (e.g. <html />
). The first two forms must come in pairs, so it doesn't seem to make sense to remember both of them - missing closing tag is an error anyway and you may decide to auto-correct such errors anyway.
So you may describe a tag with a structure (Tag, RequiresClose) and then construct a hashtable or dictionary which maps string -> bool
, where Boolean value indicates whether the tag requires a corresponding closing tag:
<html>|<title>|</head>|</html>|</title>|<head>
These tags would populate the dictionary like:
html -> true
title -> true
head -> true
And then you need to construct an interpreter for this dictionary, which would construct the output sequence of tags. Below is the pseudocode of the interpreter.
Interpret()
InterpretHtml()
InterpretHtml()
if html not present in dictionary then quit
if dictionary['html'] = false then print <html /> and quit
print <html>
InterpretHead()
InterpretBody()
print </html>
remove html from dictionary
InterpretHead()
if head not present in dictionary then quit
if dictionary['head'] = false then print <head /> and quit
print <head>
InterpretTitle()
print </head>
remove head from dictionary
InterpretTitle()
if title not present in dictionary then quit
if dictionary['title'] = false then print <title /> and quit
print <title>
// somehow find the title and print it
print </title>
remove title from dictionary
This approach is tedious and it doesn't produce predictable output.
Therefore, I would recommend you to thoroughly reconsider your task and do it in a completely different manner. Since HTML is producing hierarchical documents, you might want to consider using a stack and then force correct order of tags on the input.