0

I am trying to process HTML data held in a QString. The data has encoded HTML tags, e.g. "<" etc. I want to convert these to the appropriate symbols.

I have been trying a number of approaches but none seem to work, which suggest I am missing something really simple.

Here is the code (amended to fix typos reported by earlier comments):

QString theData = "&lt;!DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.0//EN&quot; &quot;http://www.w3.org/TR/REC-html40/strict.dtd&quot;&gt;
&lt;html&gt;&lt;head&gt;&lt;meta name=&quot;qrichtext&quot; content=&quot;1&quot; /&gt;&lt;style type=&quot;text/css&quot;&gt;
p, li { white-space: pre-wrap; }
&lt;/style&gt;&lt;/head&gt;&lt;body style=&quot; font-family:'Arial'; font-size:20pt; font-weight:400; font-style:normal;&quot;&gt;
&lt;table border=&quot;0&quot; style=&quot;-qt-table-type: root; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;&quot;&gt;
&lt;tr&gt;
&lt;td style=&quot;border: none;&quot;&gt;
&lt;p style=&quot; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;&quot;&gt;&lt;span style=&quot; font-size:14pt; color:#4cb8ff;&quot;&gt;This is text on the second page. This page contains a embedded image,&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;&quot;&gt;&lt;span style=&quot; font-size:14pt; color:#4cb8ff;&quot;&gt;and audio.&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/body&gt;&lt;/html&gt;";

QString t2 = theData.replace("&amp;", "&").replace("&lt;", "<").replace("&gt;", ">").replace("&quot;", "'");

The value of t2 however is the same as theData after the replaces.

TenG
  • 3,843
  • 2
  • 25
  • 42

1 Answers1

0

There is no definition of t1 in your code, I suppose you mean theData (and no double dot). The QString::replace functions alter the value of the string and return a reference of this.

QString s = "abc";
s.replace("a", "z").replace("b", "z");
// s = "zzc";

// if you don't want to alter s
QString s = "abc";
QString t = s;
t.replace("a", "z").replace("b", "z");

But there is better way to escape/unescape html strings:

// html -> plain text
QTextDocument doc;
doc.setHtml(theData);
QString t2 = doc.toPlainText();

// plain text -> html
QString plainText = "#include <QtCore>"
QString htmlText = plainText.toHtmlEscaped();
// htmlText == "#include &lt;QtCore&gt;"

If you only want to convert html entities, I use the following function, complementary to QString::toHtmlEscaped():

QString fromHtmlEscaped(QString html) {
  html.replace("&quot;", "\"", Qt::CaseInsensitive);
  html.replace("&gt;", ">", Qt::CaseInsensitive);
  html.replace("&lt;", "<", Qt::CaseInsensitive); 
  html.replace("&amp;", "&", Qt::CaseInsensitive);
  return html;
}

In all cases, it should hold that str == fromHtmlEscaped(str.toHtmlEscaped()).

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
Bertrand
  • 279
  • 2
  • 6
  • The order of replacements matter, the ampersand must be substituted last. I've edited the function into the reply, feel free to remove the comment. Generally speaking, comments that indicate a deficiency in a question/answer should be addressed by fixing or amending the question/answer, respectively. – Kuba hasn't forgotten Monica Apr 04 '16 at 20:25