0

I am creating end to end flow to consume data into HDFS by using Consume Kafka for the Json files received through tealium event stream. Currently, I have used Consume Kafka -> Evaluate Json Path -> Jolttransform Json -> Merge Content -> Evaluate Json Path -> Update attribute -> PutHDFS

The requirement is to read Json Data for entire day spools into a single file referring to attribute postdate(covert epoch to YYYYMMDDSS timestamp before) and read data daily to merge into a single file and finally rename the file as per the Timestamp related to POST_DATE field to differentiate daily files. I have done all the part except renaming time stamp for the merged file as per the source attribute timestamp field. Could you please help me how to rename the file as per the attribute _year_month_day?

Deepak
  • 13
  • 1
  • 6

2 Answers2

0

If you want to parse "year" and "month" from POST_DATE attribute, you can use format and toDate function.

For example:

-- year
format(toDate(${POST_DATE}, "YYYYMMDDSS"),"yyyy")

-- month
format(toDate(${POST_DATE}, "YYYYMMDDSS"),"MM")

--day
format(toDate(${POST_DATE}, "YYYYMMDDSS"),"dd")

I'm not sure the meaning of Rename the file, if it means changing file name before put to HDFS, you can simply use UpdateAttribute processor then update attribute contains the output file name like ${year}_${month}_${day}.

gogocatmario
  • 70
  • 10
  • Thank you for the help. I need to rename the output merged file as per the year,month,date reading from post_date attribute. I have merged all the daily files and created year and month subfolders in HDFS. I have added file name to have Key and Value in the Update attribute processer. Currently, the merged filename is getting created from current date format instead from the attribute but I need help to refer the attribute date. Current file name which is getting created is tealium_es_${now():format("yyyy_MM_dd")}.jsonl. – Deepak Oct 09 '19 at 13:10
0

@gogocatmario, thanks for the response. Issue resolved post adding the following value for the filename property on update_attribute. tealium_es_${post_date:toDate("yyyy-MM-dd HH:mm:ss"):format("yyyy_MM_dd")}.json1

Deepak
  • 13
  • 1
  • 6