2

I'm very new to elasticsearch and got stuck with forming the search queries, or rather using the ElasticSearch PHP Client

what i'm trying to do is the following: we have 500,000 documents, with various properties that are available for each document. we need to be able to do a lookup by any one of the fields, ie: search by tsin or manufacturer or tags or url . currently i use the following syntax for searching:

I found that for exact match i first need to initialize index with the following properties:

// Adding a new type to an existing index
$myMappings = array(
    '_source' => array(
        'enabled' => true
    ),
    'properties' => array(
        'urls' => array(
            "index"    => "not_analyzed",
            'type'     => 'string',
            'analyzer' => 'keyword',
        )
    )
);
$params['body'][dupes] = $myMappings;

after i loaded all data in I tried to run the following style query with elasticsearch client-php:

$doc['index'] = 'lookup';
$doc['type']  = 'dupes';

$doc['body']['query']['bool']['should'][]['terms']['tsin'] = (array)$data['tsin'];
$doc['body']['query']['bool']['should'][]['terms']['urls'] = $urls_array;
$doc['body']['query']['bool']['should'][]['terms']['title'] = (array)$data['title'];
$doc['body']['query']['bool']['minimum_should_match'] = 1;

$retDoc = $client->search($doc);

I need to pull all documents matched on any of the fields, note that URL might have multiple lookup values, currency i submit array of URLs that I want to search by.

what is the proper way to structure a query that would be able to pull all results where each entry matches exactly including multi-valued fields (in my example "urls")?

hits: {
total: 578363
max_score: 1.9548206
hits: [
    {
        _index: dedupe_lookup
        _type: product
        _id: tseym752t1
        _score: 1.9548206
        _source: {
            tsin: tdfaa
            slug: usb-metallic-i-love-you-hand-gesture-necklace-flash-drive
            title: usb metallic i love you hand gesture necklace flash drive
            manufacturer:
            tags: null
            approved: 0
            urls: [
                http://www.aliexpress.com/item/free-shipping-new-gift-gold-metal-hand-shape-pendant-usb-flash-drive-the-hand-shape-on/1064760529.html
                http://hert.en.ec21.com/unique_hand_gesture_necklace_usb--5017329_6432159.html#
                http://www.wsdeal.com/cu0386y/join.ws
                http://www.fat.com/8-g-love-you-hand-gesture-design-usb-flash-drive-golden-p-59257.html
                http://www.landofthebest.com/8-g-love-you-hand-gesture-design-usb-flash-drive-golden--inf-15888.html
                http://www.madlime.com/4-g-love-you-hand-gesture-design-usb-flash-drive-golden
                http://lady.brando.com/usb-metallic-i-love-you-hand-gesture-necklace-flash-drive_p00594c0005d003.html
            ]
            create_date: 2014-02-13T21:47:07+00:00
            modified_date: 2014-02-14T00:56:08+00:00
        }
    }, { .... more documents like that }
Alex Smirnov
  • 537
  • 1
  • 5
  • 12
  • look at multi_match query (with phrase clause): http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html AFAIK you don't have to do anything special for multi-valued field in that case. – Ashalynd Feb 15 '14 at 18:20
  • i'm using HEAD plugin and when i try to type in this query: { "multi_match": { "query": "http://www.wsdeal.com/cu0386y/join.ws", "fields": [ "urls" ] } } it does not generate any results. am I doing it right? – Alex Smirnov Feb 16 '14 at 01:24
  • I see a semicolon after URL. is that a typo? Also, try to add "type":"phrase" for that query. – Ashalynd Feb 16 '14 at 09:43
  • adding type:phrase fixed it .. thank you so much! is there a way to make OR condition in the query field? for example if i need to find a document that contains any one of the serach URLs? – Alex Smirnov Feb 16 '14 at 16:50
  • You might try query_string query then: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html (but may be you'll need to tinker with parameters there to enable exact matches) – Ashalynd Feb 16 '14 at 20:31
  • The default operator for multi_match is OR, is this what you're asking? – Michael at qbox.io Feb 18 '14 at 16:43

0 Answers0