0

We are planning to migrate from Camus to Gobblin. In Camus we were using below mentioned configs:

etl.partitioner.class=com.linkedin.camus.etl.kafka.partitioner.TimeBasedPartitioner
etl.destination.path.topic.sub.dirformat=YYYY/MM/dd/HH/mm
etl.output.file.time.partition.mins=30

But in Gobblin we have configs as:

writer.file.path.type=tablename
writer.partition.level=minute (other options: daily,hourly..)
writer.partition.pattern=YYYY/MM/dd/HH/mm

This creates directories on a minute level, but we need 30 min partitions.

Couldn't find much help in the official doc: http://gobblin.readthedocs.io/en/latest/miscellaneous/Camus-to-Gobblin-Migration/

Are there any other configs which can be used to achieve this?

alex
  • 12,464
  • 3
  • 46
  • 67
mukul
  • 433
  • 7
  • 18

1 Answers1

0

Got a workaround by implementing a partitionerMethod inside custom WriterPartitioner :

While fetching the record level timestamp in the partitioner we just need to send the processed timestamp millis using below mentioned method.

public static long getPartition(long timeGranularityMs, long timestamp, DateTimeZone outputDateTimeZone) {
    long adjustedTimeStamp = outputDateTimeZone.convertUTCToLocal(timestamp);
    long partitionedTime = (adjustedTimeStamp / timeGranularityMs) * timeGranularityMs;
    return outputDateTimeZone.convertLocalToUTC(partitionedTime, false);
}

Now partitions are getting generated at required time granularity.

mukul
  • 433
  • 7
  • 18