0

I have Vertica scheduler that consumes avro data from Kafka. one of the Vertica's columns is TIMESTAMP, currently,I tried defined the Avro schema like the following example:

 {
     "name":"startDate",
     "type": "long",
     "logicalType": "timestamp-millis"
  },

The problem is that my scheduler failed to consume the data and I also can't find any error on Vertica's scheduler tables. any idea what do I need to change in avro schema?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
T1234
  • 63
  • 1
  • 9
  • I asked a similar question. You should find the answer here at any rate. https://stackoverflow.com/questions/55460854/how-to-transform-all-timestamp-fields-when-using-kafka-connect – A. Saunders May 31 '19 at 04:29
  • What do you mean "failed to consume"? Is there an active Kafka consumer group that has greater than zero lag? Maybe the table needs to just be a numeric type rather than a timestamp (which could imply a ISO8601 string) – OneCricketeer Jun 04 '19 at 02:14

2 Answers2

0

I think your issue could be that a Vertica timestamp is not a Unix epoch time in milliseconds (what your Avro schema has)

https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/DataTypes/Date-Time/DateTimeDataTypes.htm

You would have to use a BIGINT or string typed column instead, then use some Vertica date functions to convert those to readable timestamps

https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TO_TIMESTAMP.htm

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • my question is can I use to_timestamp during scheduler's ingestion? or maybe the solution needs to be writing to Flex Table and then convert it automatically to a regular table. – T1234 Jun 04 '19 at 07:09
  • I'm really only answering from the Avro/Kafka standpoint. I have no Vertica experience, sorry. – OneCricketeer Jun 04 '19 at 07:11
0

As I was trying to state in my last answer—which was deleted for some reason—you cannot do any type casting or conversion on data ingested with the scheduler.

To get the correct timestamp you have two options:

  1. Do the conversion with Kafka Connect using the single message transforms that I linked to in my original answer that was deleted (https://docs.confluent.io/current/connect/transforms/timestampconverter.html), or
  2. Copy the data from Kafka as is and do the conversion after you have copied the data into a Vertica table or a flex table.

To do the second option, you would take the timestamp-millis epoch from Kafka divide it by 1,000 and then use the TO_TIMESTAMP function. If the Kafka timestamp-millis epoch was 1556204536000, then you can execute the following SQL to get the actual timestamp:

SELECT TO_TIMESTAMP(1556204536000 / 1000);

Which will return the value:

2019-04-25 15:02:16

However, the second option would require you to do new conversions every time you get new data, so it's really only useful if you are using Kafka for bulk loads.

Your best bet is to use the single message transforms or find some other way of converting the timestamp columns into a string format before you send the data to Kafka. Vertica will accept the string in a TIMESTAMP column, so you will not need to do any casting on the Vertica side.

A. Saunders
  • 815
  • 1
  • 6
  • 19
  • Thanks for your detailed answer. I can control the Kafka Avro so I can write it as String and then as you said it's suppose to work so that sounds good for me. Thanks I will try it. – T1234 Jun 05 '19 at 20:31