0

I have this piece of code:

<a href="http://www.fnac.pt/Memorial-do-Convento-Texto-em-Analise-Varios/a242166" class="fontbigger">Memorial do Convento - Texto em Análise</a>

...and I want to get this part:

Memorial do Convento - Texto em Análise

How do I do it? I have tried this:

<a href="[^<]+" class=".+">(.+)</a>

...but the first [^<] doesn't work cause it recognizes only this:

http://www.fnac.pt/Memorial-do-Convento-Texto-em-Analise-Varios/
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • Use an HTML Parser – Thomas Ayoub Jun 27 '16 at 12:56
  • "Lazy" answer: >.*< You just have to remove < and >. The part where you want to leave this '-' between convento and texto - this is hard part. In your html You could mark this specific dash as diffrent character (fe '~'). And then in java strings methods remove any dashes but this one, and at the end change ~ to -. – Michał M Jun 27 '16 at 13:11
  • You might be better off using the `split` function using a `/` as delimiter. Then you could select the 2nd element in the resulting array (0th element is the 1st element). – Matt Cremeens Jun 27 '16 at 13:18

2 Answers2

0

You can use captured with this regex:

QRegularExpression regex("<a.*?>(.*?)<\\/\\s*a\\s*>", QRegularExpression::MultilineOption);
QRegularExpressionMatch match = regex.match(resultHTML);
QString output = match.captured(1);
Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
0

Tried out this regex and it worked

>([^<>]+)<

However parsing HTML with regex isn't the best option

arhr
  • 1,505
  • 8
  • 16
  • it worked? it didn't... yes, i understand that its not the best option but it is how the teacher wants... – user6236820 Jun 27 '16 at 13:56
  • https://regex101.com/r/iH5wG0/1 Don't forget the () is a capturing group that you need to access afterwards – arhr Jun 27 '16 at 14:05
  • ah okay, and how would you get the link? `http://www.fnac.pt/Memorial-do-Convento-Texto-em-Analise-Varios/a242166` – user6236820 Jun 27 '16 at 15:00