0

I am new with Elasticsearch and I have spent several hours trying to solve this so, Thanks in advance if you will try to help me.

:) (Not too) Short explanation: (What I have so far and what I try to achieve):

I created a CouchDB database (spain_locales) that contains more then 8000 documents with Spanish Cities and Provinces. On the other hand I have a HTML Form with a jQuery Autocomplete and I show results as I type. I connect to the ElasticSearch from a PHP (Laravel Service Provider) that I created and I return results for the jQuery Autocomplete. I suppose that this can be made by connecting from client directly to the ElasticSearch but for security reasons I prefer it like this for now.

:( The Problem:

The results I get from ElasticSearch are not exactly what I expect and I don't know how to fix what I have or if it's the correct way to do it. I don't know if the bool query it's for what I need or if I should use other type of query.

  1. I only get results if I type the words exactly like they are in the Database:

    If I type Álava I obtain results but NOT for Alava ( The Á accent makes the difference)

  2. I don't obtain results until I type the complete word:

    If I type Albacete I obtain results but NOT for Albacet

I used CouchDB River Plugin for ElasticSearch to synchronise CouchDB with ElasticSearch >> https://github.com/elasticsearch/elasticsearch-river-couchdb and I made it with the following command trough terminal:

curl -XPUT 'localhost:9200/_river/spain_locales/_meta' -d '{
    "type" : "couchdb",
    "couchdb" : {
        "host" : "localhost",
        "port" : 5984,
        "db" : "spain_locales",
        "filter" : null
    },
    "index" : {
        "index" : "spain_locales",
        "type" : "spain_locales",
        "bulk_size" : "100",
        "bulk_timeout" : "10ms"
    }
}'

I also tried with:

curl -XPUT 'localhost:9200/_river/spain_locales/_meta' -d '{
    "type" : "couchdb",
    "couchdb" : {
        "host" : "localhost",
        "port" : 5984,
        "db" : "spain_locales",
        "filter" : null
    },
    "index" : {
        "number_of_shards" : 2,
        "refresh_interval" : "1s",
        "analysis": {
          "analyzer": {
            "folding": {
              "tokenizer": "standard",
              "filter":  [ "lowercase", "asciifolding" ]
            }
          }
        },
        "index" : "spain_locales",
        "type" : "spain_locales",
        "bulk_size" : "100",
        "bulk_timeout" : "10ms"
    }
}'

None of above return any error and successfully create the _river synchronisation but still have the accents and whole words issue.

I also tried to somehow apply the needed filters with the following command trough the terminal:

curl -XPUT 'localhost:9200/spain_locales/' -d '
{
  "settings": {
    "analysis": {
      "analyzer": {
        "folding": {
          "tokenizer": "standard",
          "filter":  [ "lowercase", "asciifolding" ]
        }
      }
    }
  },
  "uuid":"KwKrBc3uQoG5Ld1nOdc5rQ"
}'

But I get the following error:

{"error":"IndexAlreadyExistsException[[spain_locales] already exists]","status":400}

CouchDB Documents Examples:

{
   "_id": "1",
   "_rev": "1-087ddbe8593f68f1d7d37a9c3f6de787",
   "Provincia": "Álava",
   "Poblacion": "Alegría-Dulantzi",
   "helper": ""
}

{
   "_id": "10",
   "_rev": "1-ce38dcdabeb3b34d34d2296c6e2fdf24",
   "Provincia": "Álava",
   "Poblacion": "Ayala/Aiara",
   "helper": ""
}

{
   "_id": "100",
   "_rev": "1-72e66601e378ee48519aa93601dc0717",
   "Provincia": "Albacete",
   "Poblacion": "Herrera (La)",
   "helper": "La Herrera"
}

PHP Service Provider / Controller:

public function searchzones(){

    $q = (Input::has('term')) ? Input::get('term') : 'null';

    $params['index'] = 'spain_locales';
    $params['type']  = 'spain_locales';

    $params['body']['query']['bool']['should'] = array(
        array('match' => array('Poblacion' =>  $q)),
        array('match' => array('Provincia' =>  $q))
    );

    $query = $this->elasticsearch->search($params);

    if ($query['hits']['total'] >= 1){

        $results = $query['hits']['hits'];

        foreach ($results as $zone) {
            
            $databag[] = array( "value"     => $zone['_source']['Poblacion'].', '.$zone['_source']['Provincia'],
                                "state"     => $zone['_source']['Provincia'],
                                "city"      => $zone['_source']['Poblacion'],
            );

        }

    } else {

        $results = ['res' => null];
        $databag[] = array();

    }

    return $databag;

    } // End Search Zones

jQuery (JavaScript):

// Sugest locations when user type in zones 
$(document).ready(function() {
    $('#zones').autocomplete({
            
            source : applink + 'ajax/searchzones',
            select : function(event, ui){
                console.log(ui);
            }
                
    }); // End autocomplete
}); // End Document ready

HTML Form part (Twitter Bootstrap):

<div class="form-group">
<div class="input-group input-append dropdown">
<input type="text" class="form-control typeahead" placeholder="City name" name="zones" id="zones">
<div class="input-group-btn" >
<button type="button" class="btn btn-default dropdown-toggle" data-toggle="dropdown"><span class="caret"></span></button>
<ul class="dropdown-menu dropdown-menu-right" id="dropZonesAjax">                           
</ul>
</div>
</div>
<div id="zonesAjax"></div>   
</div>

I found the following resource: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html but I don't know how to implement/achieve that.

Thanks a lot for your time and for try to help! Sorry for my English!

Community
  • 1
  • 1
Catalin Cardei
  • 304
  • 4
  • 15
  • Can you post the mapping you are using for this index? Also, have you checkout out the documentation on the completion suggester (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html)? – hubbardr Dec 11 '14 at 21:44
  • Hi! Thanks for answer! I don't know what you mean with mapping. I didn't make anything more with the elasticsearc ... Just the stuff I explained. – Catalin Cardei Dec 11 '14 at 21:56

1 Answers1

0

Try to create your mapping before indexing. Then you can define the analyzer you have mentioned (folding) and assign it to your fields:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "folding": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "locales": {
      "properties": {
        "Provincia": {
          "type": "string",
          "analyzer": "folding"
        },
        "Poblacion": {
          "type": "string",
          "analyzer": "folding"
        },
        "helper": {
          "type": "string"
        }
      }
    }
  }
}
sven.kwiotek
  • 1,459
  • 15
  • 22