I am using logstash with the JDBC driver to bulk import a bunch of data from SQL Server to Elasticsearch. (The end goal is to have this data be searchable from a web front-end.)
One of the table columns contains HTML tags (<span id='blah'>
, <p class='foo'>
, etc). I want the content to be searchable, but the tags to be ignored. That is if someone searches for the word "foo", the document that contains <p class='foo'>
should NOT come up. On the other hand, I DO want the full content, including markup, to stored in Elasticsearch.
Is there something I can do in my logstash .config
file to make Elasticsearch "aware" that this is HTML content?
`) are in just one column, with the real content in another column. With the jdbc plugin, each column from the table is a field in a document, so that's why I said to make that field not indexed.
– baudsp May 07 '17 at 15:05