0

I have the following html snippet

       <span class='ocr_line' id='line_11' title="bbox 0 482 377 539">
<span class='ocrx_word' id='word_34' title="bbox 0 484 51 539"><em>WORD1</em></span> 
<span class='ocrx_word' id='word_35' title="bbox 56 482 119 528">WORD2</span> 
<span class='ocrx_word' id='word_35' title="bbox 56 482 119 528"><em></em></span> 
<span class='ocrx_word' id='word_36' title="bbox 137 483 171 528"><strong><em>WORD3</em></strong></span> 
<span class='ocrx_word' id='word_37' title="bbox 176 482 244 528"><h1>WORD4</h1></span> 
</span> 

I would like the xpath query string to grab out the bbox string and the node content for words 1-4. I'm having trouble because the words be nested with <em>s and <strong>s and might be empty too! Thanks.

JLRishe
  • 99,490
  • 19
  • 131
  • 169
M.R.
  • 1,053
  • 2
  • 13
  • 30

1 Answers1

0

This perhaps : //@title | //text()

Istao
  • 7,425
  • 6
  • 32
  • 39