I'm developing a simple data ingestion pipeline, and at some point, I'm processing a group of flow files whose content looks something like this:
[
{
"ID" : 1,
"CREATEDON" : "2019-10-26T18:40:16.000Z",
"UPDATEDON" : "2019-11-26T18:41:38.000Z",
...
}, {
"ID" : 2,
"CREATEDON" : "2019-11-26T18:41:38.000Z",
"UPDATEDON" : "2019-12-28T11:11:28.000Z",
...
}, ...
]
With the following Avro schema:
{
"type":"record",
"name":"NiFi_ExecuteSQL_Record",
"namespace":"any.data",
"fields":[
{
"name":"ID",
"type":[
"null",
"long"
]
},
{
"name":"CREATEDON",
"type":[
"null",
{
"type":"long",
"logicalType":"timestamp-millis"
}
]
},
{
"name":"UPDATEDON",
"type":[
"null",
{
"type":"long",
"logicalType":"timestamp-millis"
}
]
},
...
]
}
My current goal is to extract the maximum value of both CREATEDON
and UPDATEDON
columns. To do that, I used the QueryRecord
processor to extract the maximum from each column with the following query:
SELECT
MAX(createdOn) AS max_createdOn, MAX(updatedOn) AS max_updatedOn
FROM
FLOWFILE
But when I run this processor I get the following error:
Unable to query StandardFlowFileRecord[...] due to java.sql.Timestamp cannot be cast to java.lang.Long
This makes sense since Apache Calcite represents SQL timestamps as java.lang.Long values and doesn't use java.sql.Timestamp internally, so it's not able to cast it.
The question is: Is there any way to do this without having to create a custom processor?