1

I'm developing a simple data ingestion pipeline, and at some point, I'm processing a group of flow files whose content looks something like this:

[ 
    {
        "ID" : 1,
        "CREATEDON" : "2019-10-26T18:40:16.000Z",
        "UPDATEDON" : "2019-11-26T18:41:38.000Z",
        ...
    }, {
        "ID" : 2,
        "CREATEDON" : "2019-11-26T18:41:38.000Z",
        "UPDATEDON" : "2019-12-28T11:11:28.000Z",
        ...
    }, ... 
]

With the following Avro schema:

{
   "type":"record",
   "name":"NiFi_ExecuteSQL_Record",
   "namespace":"any.data",
   "fields":[
      {
         "name":"ID",
         "type":[
            "null",
            "long"
         ]
      },
      {
         "name":"CREATEDON",
         "type":[
            "null",
            {
               "type":"long",
               "logicalType":"timestamp-millis"
            }
         ]
      },
      {
         "name":"UPDATEDON",
         "type":[
            "null",
            {
               "type":"long",
               "logicalType":"timestamp-millis"
            }
         ]
      },
      ...
   ]
}

My current goal is to extract the maximum value of both CREATEDON and UPDATEDON columns. To do that, I used the QueryRecord processor to extract the maximum from each column with the following query:

SELECT 
    MAX(createdOn) AS max_createdOn, MAX(updatedOn) AS max_updatedOn 
FROM 
    FLOWFILE

But when I run this processor I get the following error:

Unable to query StandardFlowFileRecord[...] due to java.sql.Timestamp cannot be cast to java.lang.Long

This makes sense since Apache Calcite represents SQL timestamps as java.lang.Long values and doesn't use java.sql.Timestamp internally, so it's not able to cast it.

The question is: Is there any way to do this without having to create a custom processor?

André Machado
  • 726
  • 6
  • 21

0 Answers0