Index documents to opensearch date format problems

Question

I am reading some data from a pandas Dataframe and trying to index them to aws opensearch ,but I am getting this error

elasticsearch.helpers.errors.BulkIndexError: ('3 document(s) failed to index.', [{'index': {'_index': 'document', '_type': '_doc', '_id': '8db2e0ac6b659499cb0fd977a59bc3ce', 'status': 400, 'error': {'type': 'mapper_parsing_exception', 'reason': "failed to parse field [APPROVED_ON] of type [date] in document with id '8db2e0ac6b659499cb0fd977a59bc3ce'. Preview of field's value: '2020-07-06 08:05:00'", 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'failed to parse date field [2020-07-06 08:05:00] with format [strict_date_optional_time||epoch_millis]', 'caused_by': {'type': 'date_time_parse_exception', 'reason': 'Failed to parse with all enclosed parsers'}}},

I tried to convert the field APPROVED_ON to string using this:

data=data.astype({"APPROVED_ON": str})

but same error persist

This is the output of data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 11740 entries, 0 to 11739
Data columns (total 17 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   ORDERS_NO          11740 non-null  object 
 1   SUBJECT            11740 non-null  object 
 2   ORG_FILENAME       11740 non-null  object 
 3   IS_LAB_REP         11740 non-null  float64
 4   DOC_PATH           11740 non-null  object 
 5   DM_FILENAME        11740 non-null  object 
 6   ORDERS_ID          11740 non-null  object 
 7   STATUS             11740 non-null  object 
 8   PROJECT_NO         11740 non-null  object 
 9   MODEL              11740 non-null  object 
 10  PQM_NO             560 non-null    object 
 11  REPORT_SENT_ON     11740 non-null  object 
 12  APPROVED_ON        11002 non-null  object 
 13  CONFIDENTIAL       11740 non-null  float64
 14  ORDER_DESCRIPTION  11740 non-null  object 
 15  TASK_DESCRIPTION   11737 non-null  object 
 16  TEXT_RESULT        5377 non-null   object 
dtypes: float64(2), object(15)
memory usage: 1.6+ MB

pls share `data.info()` dtype results before pushing data into opensearch — Divyank, Aug 09 '22 at 11:37
try changing `APPROVED_ON` column to `datetime` instead of `object` and try pushing data to opensearch. Ex- `data['APPROVED_ON'] = pd.to_datetime(data['APPROVED_ON'])` — Divyank, Aug 09 '22 at 11:49

score 0 · Answer 1 · answered Aug 09 '22 at 14:08

Date formats can be customised, but if no format is specified then it uses the default:

"strict_date_optional_time||epoch_millis"

strict_date_optional_time

A generic ISO datetime parser, where the date must include the year at a minimum, and the time (separated by T ), is optional. Examples: yyyy-MM-dd'T'HH:mm:ss.SSSZ or yyyy-MM-dd .

To solve this, update the template and define specific date format. (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html)

Index documents to opensearch date format problems

1 Answers1