2

I've this HTML block:

<div class="singolo-contenuto link_azure">
<p>I'm a TEXTXXXXXXXXXXXXXXXX<p>
<a href="http://example.com">Name of URL</a></p></p>
<ul class="list_attachments"><li><a    
href="DON'T TOUCH"><img src='/img/fileicons/file.png' alt='file'/> TITLE</a></li></ul> 
</div>
<div class="clear"></div>

Actually I'm taking text with:

 document.select(".singolo-contenuto").text();

That returns to me: "I'm a TEXTXXXXXXXXXXXXXXXX Name of URL". Isn't possible to get "I'm a TEXTXXXXXXXXXXXXXXXX http://example.com Name of URL"?

are not always the same in all the pages. I'm only sure that text and href will be in the ""singolo-contenuto link_azure" class.

helloimyourmind
  • 994
  • 4
  • 14
  • 30
  • [This post](http://stackoverflow.com/questions/15439853/get-local-href-value-from-anchor-a-tag) will be helpful to you in getting the anchor tag's `href` information. Though I'm not sure that you can impose that directly within your `div.text()` call. You could make separate calls and manipulate the Strings afterwards. – CubeJockey May 26 '15 at 16:34
  • Technically, href is not text, but part of the markup. – Alexander Pogrebnyak May 26 '15 at 16:36

1 Answers1

2

You can replace all links by text as you want then call .text()

pseudo code:

for (Element elem : document.select(".singolo-contenuto a")) {
    if(elem.parents().hasClass("list_attachments")) continue;
    String href = elem.attr("href");
    String text = elem.text();
    elem.replaceWith(new TextNode(href + " " + text, ""));
}
String result = document.select(".singolo-contenuto").text();
Jens Piegsa
  • 7,399
  • 5
  • 58
  • 106
zborek
  • 81
  • 8
  • Thanks for your suggests but in this way I take also the linke in the inner class "list_attachments". Is there a way to select (".singolo-contenuto a") exception for ".list_attachments"? – helloimyourmind May 26 '15 at 20:22
  • Can be done by selectors depends on your document structure i.e `.singolo-contenuto a:not(.list_attachments)` or `.singolo-contenuto :not(.list_attachments) a`. [http://jsoup.org/apidocs/org/jsoup/select/Selector.html](http://jsoup.org/apidocs/org/jsoup/select/Selector.html) – zborek May 26 '15 at 20:38
  • Thanks for your patience. I edited my first post with the full structure, but your suggests doesn't works. – helloimyourmind May 26 '15 at 20:56
  • Read about html selectors. Combine both above into one `.singolo-contenuto a:not(.list_attachments), .singolo-contenuto :not(.list_attachments) a` – zborek May 26 '15 at 21:00
  • Still doesn't works :( I'm reading for selector.. thanks for your suggests. – helloimyourmind May 26 '15 at 21:15