I have an exist-db database with a couple of (large) TEI xml files which I want to index/search. For indexing, I have an xmlpipe2
command calling a sphinx-out.xql url served by the exist db. Along with the actual texts snippets (paragraphs, headings, notes etc.), this provides a couple of attributes that I later want to use when presenting search results. One of them is a crumbtrail field that contains html (more precisely, it contains a series of <a>
hyperlinks).
As I want to be able to offer sentence and paragraph operators in searching, I have set index_sp = 1
and since this in turn requires html stripping, I also have html_strip = 1
. But this seems to strip the html also from my attributes, which I want to retain...
Here is what sphinx.out.xql and then the xmlpipe2 command give:
<sphinx:docset>
<sphinx:document id="77">
<sphinx_docid>77</sphinx_docid>
<sphinx_work>W0013</sphinx_work>
<sphinx_author>Vitoria, Francisco de</sphinx_author>
<sphinx_title>Relectiones</sphinx_title>
<sphinx_year>1557</sphinx_year>
<sphinx_crumbtrail>
<span class="crumbtrail">
<a href="/exist/apps/salamanca/work.html?wid=W0013#Vol02">Vol. 2</a>
<span class="tokenizer"> > </span>
<a href="/exist/apps/salamanca/work.html?wid=W0013#Vol02Lect01">De augmento charitatis</a>
</span>
</sphinx_crumbtrail>
<sphinx_description>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p_l3w_pml_y4">
[SNIP]
</p>
</sphinx_description>
</sphinx:document>
.
.
.
</sphinx:docset>
And here is what a mysql query to sphinx gives:
mysql> select sphinx_docid, sphinx_work, sphinx_crumbtrail from salamanca_base;
+------+--------+--------------+-------------+---------------------------------+
| id | weight | sphinx_docid | sphinx_work | sphinx_crumbtrail |
+------+--------+--------------+-------------+---------------------------------+
.
.
.
| 77 | 1 | 77 | W0013 | Vol. 2 > De augmento charitatis |
+------+--------+--------------+-------------+---------------------------------+
20 rows in set (0.00 sec)
Now I wonder if there is any way for me to disable html stripping for attributes?
Can anyone at least confirm that it is possible to store html in sphinx attributes?
Thanks for any insight