I generate html documents that contain a menu and a content part. Then I want to extract the content of these document to feed it to a lucene index. However, I would like to exclude the menu from the content extraction and thus only index the content.
<div class="menu">my menu goes here</div>
<div class="content">my content goes here</div>
what is the simplest way to achieve this with apache tika?