0

I'm trying to use Elasticsearch-dsl-py to index some data from a jsonl file with many fields. ignoring the less general parts, the code looks like this:

es = Elasticsearch()

for id,line in enumerate(open(jsonlfile)):
  jline = json.loads(line)
  children = jline.pop('allChildrenOfTypeX')
  res = es.index(index="mydocs", doc_type='fatherdoc', id=id, body=jline)
  for ch in children:
    res = es.index(index="mydocs", doc_type='childx', parent=id, body=ch)

trying to run this ends with the error:

RequestError: TransportError(400, u'illegal_argument_exception', u"Can't specify parent if no parent field has been configured")

I guess I need to tell es in advance that has a parent. However, what I don't want is to map ALL the fields of both just to do it.

Any help is greatly welcomed!

Ori5678
  • 499
  • 2
  • 5
  • 15

1 Answers1

0

When creating your mydocs index, in the definition of your childx mapping type, you need to specify the _parent field with the value fatherdoc:

PUT mydocs
{
  "mappings": {
    "fatherdoc": {
       "properties": {
          ... parent type fields ...
       }
    },
    "childx": {
      "_parent": {                      <---- add this
        "type": "fatherdoc" 
      },
      "properties": {
          ... parent type fields ...
      }
    }
  }
}
Val
  • 207,596
  • 13
  • 358
  • 360
  • can I do that without listing all the fields - just the '_parent'? – Ori5678 Nov 07 '16 at 12:05
  • You need to do it at the index creation time, which means when you define your mapping. You can definitely let ES dynamically create your fields on the go, but the `_parent` field must be specified at the very beginning before indexing the first document. – Val Nov 07 '16 at 16:10