1

Hi sir,

In data ingest template i need to get this property for ex i have data with date field

date data 12-07-2018 a 13-07-2018 b 14-07-2018 c 15-07-2018 d

In that , i would like to take latest one i.e, 15-07-2018

if date field got new data 16-07-2018 e then i have to get 16-07-2018 by checking last updated date 15-07-2018 rather than checking from first one 12-07-2018

like that, if i got 17-08-2108 f then have to get 17-08-2018 by checking with last new date 16-07-2018 ..

how to achieve this , in which processor i have to do modifications or have to add new properties When the feed runs again, how does it take the latest watermark and work from there

IMRAN S K
  • 23
  • 3
  • Welcome to SO! In order for us to help make sure you detail what you are trying to accomplish, provide examples of what you tried ([Formatting helps!](https://stackoverflow.com/editing-help)), and explain what you expect to see. Take a look at the [Expression Language Guide on Dates](https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#dates) and see if that sparks any ideas. – Nathan Jul 17 '18 at 11:16

1 Answers1

0

Two possible approach comes to my mind:

  1. Write your own Spark app which would be used (ExecuteSparkJob) to read through the file which is getting ingested. In this case, you keep track of the max date and when you are done through the ingestion, persist it somewhere. If you're in HDP world, easy thing would be to insert the max date to a Hive (transactional) table. You can also leverage ZooKeeper znode to persist or even the PutDistributedMapCache processor that NiFi offers.
  2. Write a custom NiFi processor which would basically do the same thing as the above one, except that you have to enable it yourself to work with data of different format (CSV, JSON). Spark, in this regard, comes packed with many thing built in.
Sivaprasanna Sethuraman
  • 4,014
  • 5
  • 31
  • 60