7

How can i parse the following HTML

<body>
<span style="font-size:11px">12345</span>
<a>Hello<a>
</body>

I would like to retrive the data "12345" from a "span" with style="font-size:11px" from www.testtest.com, but I only want the that very data, and nothing else.

How can I accomplish this?

László Papp
  • 51,870
  • 39
  • 111
  • 135
NPLS
  • 521
  • 2
  • 8
  • 20

2 Answers2

8

I think QXmlQuery is what you want. I think the code will be like

QXmlQuery query;

query.setQuery(html, QUrl("/body/span[@style='font-size:11p']"));

QString r;
query.evaluateTo(&r);

You can also provide URL directly to the query

query.setQuery(QUrl("http://WWW.testtest.com"), QUrl("/body/span[@style='font-size:11p']"));
Romário
  • 1,664
  • 1
  • 20
  • 30
Lol4t0
  • 12,444
  • 4
  • 29
  • 65
  • I think qtxmlpatterns (and hence this recommendation) would be a bit too much for this simple task. QtWebKit is fine with dealing with html, or if someone wishes to avoid even that, then QtCore's xml parser. However, if one deals with html, it is likely that other functionality would be needed from webkit as well. – László Papp Sep 07 '13 at 20:35
  • 3
    @LaszloPapp, QtWebkit is _a lot_ heavier than xmlpatterns. QtWebkit is the largest Qt part actually. – Lol4t0 Sep 07 '13 at 21:05
  • Lol4t0: your answer still does not make sense to me. If he has to use webkit already, that is exactly zero additional dependency. Otherwise he would need to use QtCore. Your answer is wrong either way. I will give a -1 tomorrow because I reached the limit for today. :) Basically you are suggesting an unmaintained extra dependency. That is bad. – László Papp Sep 07 '13 at 21:07
  • Please re-read it. I wrote *if*. See also my reply, and the comment before. – László Papp Sep 07 '13 at 22:20
  • 1
    Well, as of Qt 5.6, `QtWebKit` is no longer there, so this is now the more correct answer. – Romário Jul 01 '16 at 16:29
3

EDIT: From the Qt 5.6 release blog post:

With 5.6, Qt WebKit and Qt Quick 1 will no longer be supported and are dropped from the release. The source code for these modules will still be available.

So, as of Qt 5.6 – unless you're willing to compile the sources –, QtWebKit is no longer available. If you're using a Qt release older than 5.6 ot are willing to compile QtWebKit, this might be helpful; otherwise this answer is no longer valid.


It is hard to tell you exactly what needs to be done as your explanation is incomplete about the use case. However, there are two ways of proceeding.

QtWebKit

If you already need any other functionality from that module, this is not going to introduce any further dependencies, and it will be the most convenient for you to use.

You need to get the https://doc.qt.io/archives/qt-5.5/qwebelement.html

That will come once you find the first "span" element in your html:

https://doc.qt.io/archives/qt-5.5/qwebframe.html#findFirstElement

Then, you can simply get the text for that element with the corresponding QWebElement method(s). For instances, you can use this one for getting an attribute value:

https://doc.qt.io/archives/qt-5.5/qwebelement.html#attribute

... but you can also request the attribute names as you can see in the documentation, etc.

This is how you will get the 12345 value:

https://doc.qt.io/archives/qt-5.5/qwebelement.html#toPlainText

XML parser in QtCore

If you do not need webkit for your sotware, and the html data comes in a different way rather than directly from the web for which you would need to use QWebKit, then you are better off using the xml parser available in QtCore. It still might be the case even if you do not have any other dependency from QtWebKit that this additional dependency will not cause any issues in your use case. It is hard to tell based upon your description. For sure, this would be less convenient, albeit not that much, compared to the webkit based solution as that is designed for html.

What you need to avoid is QtXmlPatterns. It is an unmaintained software as of now, and that would introduce an additional dependency for your code either way.

Romário
  • 1,664
  • 1
  • 20
  • 30
László Papp
  • 51,870
  • 39
  • 111
  • 135
  • `QtWebKit` was removed from Qt, so this answer is outdated. Is there an alternative to it aside from `QtXmlPatterns`? – Romário Jul 01 '16 at 16:18
  • 1
    Maybe casual regular expression is more suitable than whole browser? He just needs some values. – ilotXXI Jul 01 '16 at 18:47