0

I am trying to figure out a way to get a image url from a web page source. I can get the web page source into a string and parse it line by line to find the line with the URL. However, I haven't been able to figure out a good way to pull just the URL from the line. I'm think this can be done with QRegExp, but have been unable to figure out how to use it.

Line I am trying to parse

<img width="980" height="1515" id="mainImg" src="//test/123.jpg" alt="test">

Final Working Code

void MainWindow::on_btnDownload_clicked()
{
QString url = "http://test.foo.com";

QUrl qURL = url;

QNetworkAccessManager manager;
QNetworkReply *response = manager.get(QNetworkRequest(QUrl(url)));
QEventLoop event;
connect(response,SIGNAL(finished()),&event,SLOT(quit()));
event.exec();
QString html = response->readAll();
QStringList str;
str = html.split("\n");
//qDebug() << url;

for (int i = 0; i < str.size(); ++i){
    if(str.at(i).contains("id=\"mainImg\"", Qt::CaseInsensitive)){
        QString pic;
        pic = str.at(i);
        pic = pic.remove(QRegExp("<img[^>]*src=['|\"]",  Qt::CaseInsensitive));
        pic = pic.remove(QString::fromStdString("//"), Qt::CaseInsensitive);
        pic = pic.remove('"');
        pic = pic.remove("'");
        pic = pic.remove('<');
        pic = pic.remove('>');
        pic = pic.remove(';');
        pic = pic.left(pic.length()-1);
        //qDebug() << str.at(i);
        qDebug() << pic;
    }
}


qDebug() << "Lines: " << str.size();
}
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Talon06
  • 1,756
  • 3
  • 27
  • 51
  • 1
    Have you considered QtWebKit? It has the ability to access individual DOM elements and their attributes. – MrEricSir Oct 15 '14 at 18:16
  • 2
    Try: `]*src=['|\"](.*?)['|\"].*?>` http://coliru.stacked-crooked.com/a/8d593ba673d914d9 – Brandon Oct 15 '14 at 18:34
  • @Brandon That did not work – Talon06 Oct 15 '14 at 18:51
  • 1
    can you include the specific line you are trying to parse in the question? – Nicolas Holthaus Oct 15 '14 at 18:55
  • Added it to question – Talon06 Oct 15 '14 at 18:58
  • 1
    @Talon06 Brandon's solution worked for me with the line you provided. Can you post some more code? – Nicolas Holthaus Oct 15 '14 at 19:03
  • Post the code I currently have I also tried copying the code form the link into QT creator but got errors that regex was not part of std – Talon06 Oct 15 '14 at 19:09
  • 1
    Print the actual line you found, in case it's not quite what you showed; an extra space for example will make the shown pattern fail. – JDługosz Oct 15 '14 at 19:14
  • I was able to get this working by shortening the regex so that it pulled the first part of what I need then removing the rest with multiple remove commands not the most elegant but its working. – Talon06 Oct 15 '14 at 19:46
  • 1
    :S It works for `std::regex`: http://coliru.stacked-crooked.com/a/17d7d13be833741e and clearer with C++11's escaped strings: http://coliru.stacked-crooked.com/a/0cbddcf69eadd14e I don't know if it doesn't work for QT or not. If it doesn't, I have no idea why because it should. Wish I was more familiar with QT.. – Brandon Oct 15 '14 at 20:17
  • 1
    Now might be a good time to stop and read up on automata theory. HTML is a context-free language, and thus you will never be able to parse it completely with a regular expression. – MrEricSir Oct 15 '14 at 20:19
  • Comments are not provided for answering questions! – Silicomancer Oct 16 '14 at 06:12

0 Answers0