5

I have a SQL table that I am exporting to Elasticsearch.

One of the columns is a numeric field that is nullable, with nulls in some of the records.

When we try to index the table, we get this error:

One of the ETL (BigQuery -> ElasticSearch) jobs for Table : MLS has been ES Failed Chunk of 10000 from index 20000 possibly due to incompatible objects.

Failing BigQuery Table: MLS

Stack Trace of the error:

Traceback (most recent call last): File "/Users/asif/zodiacbackend/zodiacbackend/tasks.py", line 205, in

insertIntoES helpers.bulk(es, doc_generator(dataframe,table)) File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 300, in bulk for ok, item in streaming_bulk(client, actions, *args, **kwargs): File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 230, in streaming_bulk **kwargs File "/Users/asif/zodiacbackend/env/lib/python3.7/site-packages/elasticsearch/helpers/actions.py", line 158, in _process_bulk_chunk raise BulkIndexError("%i document(s) failed to index." % len(errors), errors) elasticsearch.helpers.errors.BulkIndexError: ('2 document(s) failed to index.', [{'index': {'_index': 'mls', '_type': 'mls', '_id': 'b100qHABEFI45Lp-z3Om', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'mapper [Lot_Size_Sq_Ft] of different type, current_type [text], merged_type [long]'}, 'data': { 'Lot_Size_Sq_Ft': Decimal('13504')}}}])

How do I get the system to recognize nulls?

Amit
  • 30,756
  • 6
  • 57
  • 88
arcee123
  • 101
  • 9
  • 41
  • 118

2 Answers2

10

User WittyID, missed some important things like:

  1. values of null_value must be of the same data-type, of your field, so in his example, he declared integer field but defined NULL as null_values, would throw json_parse_exception and this is mentioned as important in the official link as below:

The null_value needs to be the same data type as the field. For instance, a long field cannot have a string null_value.

  1. The null_value only influences how data is indexed, it doesn’t modify the _source document, so in your source document, whatever you passed, will be stored, not the one mentioned in null_values param and at query time also you need to use the value null_value param..

In short, null isn't recognized in ES, hence you can define your custom values for null and then use it to index and query the null values.It's easy to explain the entire thing using the below example, which anybody can try:

Create index

{
  "mappings": {
    "properties": {
      "my_signed_integer": {
        "type":"integer",
        "null_value": -1 --> note we defining `null` values as `-1`.
      }
    }
  }
}

Index doc

  1. store null integer docs

    { "my_number" : null }

If you get this doc from ES it would e returned as below:

{
   "_index": "so-6053847",
   "_type": "_doc",
   "_id": "1",
   "_version": 1,
   "_seq_no": 0,
   "_primary_term": 1,
   "found": true,
   "_source": {
      "my_number": null. --> As explained earlier, in source its stored as `null`.
   }
}
  1. Index non-negative value

    { "my_number" : 10 }

Search query to fetch integer which had null values

{
  "query": {
    "term": {
      "my_signed_integer": -1 -->notice same `null_value`, you need to mention
    }
  }
}

Result:

 "hits": [
         {
            "_index": "so-6053847",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
               "my_signed_integer": null --> notice it shows `null`, not `-1`
            }
         }
      ]

Search query for other numbers(not null) ie in our case 10

{
  "query": {
    "term": {
      "my_signed_integer": 10
    }
  }
}

Result

"hits": [
         {
            "_index": "so-6053847",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
               "my_signed_integer": 10 -->source matches the indexed value for this doc
            }
         }
      ]
Amit
  • 30,756
  • 6
  • 57
  • 88
  • This worked like a charm. Thank you. and I appreciate the patience.. New installation. – arcee123 Mar 11 '20 at 02:51
  • 1
    Love this answer! New to ElasticSearch too and you grasped this topic and explained it so clearly, and all at the same place. Take my upvote kind stranger – Sergio A. Dec 14 '21 at 16:11
2

You're dealing with a common ES head scratcher. Elasticsearch doesn't index null values (not just the numeric nulls). You need to specify in your index mapping how you want any detected null values to be indexed. Something like this:

  "mappings": {
    "properties": {
      "nullable_numeric": {
        "type":       "integer",
        "null_value": -1 
      },
      "nullable_text": {
       "type":        "text",
       "null_value":  "NULL"
    }
  }

Once you do this, ES would know how to properly index those fields. Note, you don't need to change your raw data, just let ES know how to index nulls for search....which by the way, won't affect the docs when you query ES.

WittyID
  • 609
  • 2
  • 6
  • 15
  • does this just ignore the nulls? or ignore the whole column because some are nulls? – arcee123 Mar 05 '20 at 06:04
  • It doesn't ignore the nulls or the whole column, It indexes the nulls as whatever you put as your "null_value". For example, your null_value could be "Hey, Lucene, whenever you see a null value, index it as this text here". Without that null_value, Elasticsearch is at a loss as to how to index an empty/null value. Again, this doesn't affect your documents. When you query the same document, you would see you numeric nulls as they were in BigQuery...this is all just for Elasticsearch's index. – WittyID Mar 05 '20 at 06:15
  • ok. so we make null_value somethign arbitrary, but when someone does a query, it still comes back as null? – arcee123 Mar 05 '20 at 06:42
  • 2
    Yes, bearing in mind the caveat (see answer by Opster Elasticsearch Ninja) that the arbitrary thing needs to be of same type that field is mapped to. – WittyID Mar 05 '20 at 07:41