0

I have a data pipeline that through its stages adds timestamps to the message. So a document might have:

{
   logTime:    "2023-07-05 10:10:23.617241",
   ingestTime: "2023-07-05 10:13:14.995735",
   @timestamp: "2023-07-05 10:13:16.937238",
   "sourceSystem": "A"
}

Per sourceSystem I want to calculate the time difference between each stage eg from the time it takes the logTime to reach the ingestTime is 3 minutes and the time from ingestTime to the document @timestamp is 2 seconds

Using the eland python client in a Jupyter Notebook I read in the index:

myDataFrame = eland.DataFrame(es_client=es, es_index_pattern='my-index')

If I output the types:

myDataFrame.dtypes
logTime      datetime64[ns]
ingestTime   datetime64[ns]
@timestamp   datetime64[ns]

When I attempt to subtract the datetime64[ns] fields:

myDataFrame['timeToIngest'] = myDataFrame['ingestTime'] - myDataFrame['logTime']

I am getting a TypeError

TypeError: Arithmetic operation on incompatible types datetime64[ns] __sub__ datetime64[ns]

pr

maehue
  • 505
  • 10
  • 26
  • Did you check to see if you have any missing values/dates? – Nev1111 Jul 05 '23 at 15:28
  • @Nev1111 - yes I reduced the work down to a single test index that only contains 2 documents as I had same thought perhaps null values or something. – maehue Jul 05 '23 at 15:40

0 Answers0