I'm building an application with eXist-db which works with TEI files and transform them into html.
For the search function I configured lucene to ignore some of the tags.
<collection xmlns="http://exist-db.org/collection-config/1.0" xmlns:teins="http://www.tei-c.org/ns/1.0">
<index xmlns:xs="http://www.w3.org/2001/XMLSchema">
<fulltext default="none" attributes="false"/>
<lucene>
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<analyzer id="ws" class="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
<text match="//teins:TEI">
<inline qname="p"/>
<inline qname="text"/>
<ignore qname="teins:del"/>
<ignore qname="teins:sic"/>
<ignore qname="teins:index"/>
<ignore qname="teins:term"/>
<ignore qname="teins:note"/>
</text>
</lucene>
</index>
</collection>
Well, that kinda works out, the elements don't show up in the search results directly, but in the snippets before and after the matched text, which are returned by the kwic module. Is there a way to remove them or to apply a XSL transformation before indexing?
example TEI:
...daß er sie zu entwerten sucht. Wie
<index>
<term>Liebe</term>
<index>
<term>und Hass</term>
</index>
</index>
Liebe Ausströmung inneren Wertes ist,...
When I search for "Ausströmung", the query results into
....sucht. Wie Liebe und Hass Liebe Ausströmung inneren Wertes ist,...
But should result into
....sucht. Wie Liebe Ausströmung inneren Wertes ist,...
When I search for "Hass" this text snippet does not shows up in the results.
For the search functions: I'm strictly sticking to the Shakespeare example in the documentation.