0

I'm currently parsing text from internal résumés in my company. The goal is to index everything in elasticsearch to perform search on them.

for the moment I have the following JSON document with no mapping defined :

Each coworker has a list of project with the client name

{
name: "Jean Wisser"
position: "Junior Developer"
"projects": [
        {
            "client": "SutrixMedia",
            "missions": [
                "Responsible for the quality on time and within budget",
                "Writing specs, testing,..."
            ],
            "technologies": "JIRA/Mantis/Adobe CQ5 (AEM)"
        },
        {
            "client": "Société Générale",
            "missions": [
                " Writing test cases and scenarios",
                " UAT"
             ],
            "technologies": "HP QTP/QC"
        }
    ]
}

The 2 main questions we would like to answer are :

  1. Which coworker has already worked in this company ?
  2. Which client use this technology ?

The first question is really easy to answer, for example: Projects.client="SutrixMedia" returns me the right resume.

But how can I answer to the second one ?

I would like to make a query like this : Projects.technologies="HP QTP/QC" and the answer would be only the client name ("Société Générale" in this case) and NOT the entire document.

Is it possible to get this answer by defining a mapping with nested type ? Or should I go for a parent/child mapping ?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129

1 Answers1

2

Yes, indeed, that's possible with ES 1.5.* if you map projects as nested type and then retrieve nested inner_hits.

So here goes the mapping for your sample document above:

curl -XPUT localhost:9200/resumes -d '
{
  "mappings": {
    "resume": {
      "properties": {
        "name": {
          "type": "string"
        },
        "position": {
          "type": "string"
        },
        "projects": {
          "type": "nested",        <--- declare "projects" as nested type
          "properties": {
            "client": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            },
            "missions": {
              "type": "string"
            },
            "technologies": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            }
          }
        }
      }
    }
  }
}'

Then, you can index your sample document from above:

curl -XPUT localhost:9200/resumes/resume/1 -d '{...}'

Finally, with the following query which only retrieves the nested inner_hits you can retrieve only the nested object that matches Projects.technologies="HP QTP/QC"

curl -XPOST localhost:9200/resumes/resume/_search -d '
{
  "_source": false,
  "query": {
    "nested": {
      "path": "projects",
      "query": {
        "term": {
          "projects.technologies.raw": "HP QTP/QC"
        }
      },
      "inner_hits": {           <----- only retrieve the matching nested document
        "_source": "client"     <----- and only the "client" field 
      }
    }
  }
}'

which yields only the client name instead of the whole matching document:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.4054651,
    "hits" : [ {
      "_index" : "resumes",
      "_type" : "resume",
      "_id" : "1",
      "_score" : 1.4054651,
      "inner_hits" : {
        "projects" : {
          "hits" : {
            "total" : 1,
            "max_score" : 1.4054651,
            "hits" : [ {
              "_index" : "resumes",
              "_type" : "resume",
              "_id" : "1",
              "_nested" : {
                "field" : "projects",
                "offset" : 1
              },
              "_score" : 1.4054651,
              "_source":{"client":"Société Générale"}  <--- here is the client name
            } ]
          }
        }
      }
    } ]
  }
}
Val
  • 207,596
  • 13
  • 358
  • 360
  • Thanks a lot ! working perfectly ! I'm not really sure that I understand why you added the "fields : raw ..." it is something generated automatically by elasticsearch no ? – Jean Wisser Jun 09 '15 at 13:09
  • True, I've mapped `technologies` as a [multi-field](https://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html) so you can run an exact search on it, but it's not really mandatory. You're free to change that of course. – Val Jun 09 '15 at 13:11