Introduction
I'm currently working on a project for a company, and so far so good. We are in production. We've seen some odd behavior with ElasticSearch that our QA recently found. We are using ElasticSearch along with MongoDB. ElasticSearch is populated via River, specifically the MongoDB River Plugin For Elastic Search.
Background
We aggregate, filter, and sort through upwards of 2 million job posts through our service. For searching this data quickly and effectively we use Elastic Search, with MongoDB being our main datastorage. One of the main search functions is searching by Region, State, and City. We do this with State abbreviations, ex. Madison, WI
. With this functionality we can search entire regions(ex. midwest
) and come up with results for all regions in the midwest, we can do the same for states and come up with all the results for cities in that state.
The Problem
We have an odd problem occurring where searches in the state of Oregon are turning up with no hits, or the hits do not include cities within Oregon, but instead just statewide jobs(not specific to any city).
The Cause
The most prominent cause of this seems to be that Apache Lucene reserves the word OR
as an or
operation, this is also the abbreviation of Oregon. This is what I believe to be the problem, because this odd behavior is only shown for searches in the state of Oregon.
The Solution?
My purposed solution is to change the "states" field to be not_analyzed
to prevent this from happening, and also changing my search query.
Why I cannot get this to work
MongoDB River is relatively turn key, I can point it at a database and even refine that to a collection. It will form its own mapping to my collection/s, problem being that there is no documentation or mention of how I would define my own mapping for data that's stored in MongoDB and indexed to ES using River.
Conclusion
Does anyone know of a way to change a field in a predefined mapping? Otherwise, does anyone know of how I could define my own mapping for MongoDB River? Documentation or examples would be great. It's a some what confusing issue, so if you need more details feel free to ask.