I'm building a pipeline to backup data from PubSub into GCS and wanted to create a test using JobTest
and I'm struggling to get the PubSubIO to properly get the event time.
PubSub is read using sc.pubsubSubscriptionWithAttributes[String]("path/to/subscription", timestampAttribute = "doc_timestamp")
. After this I apply windowing and send it to a CustomIO
The test looks like this:
JobTest[PubSub2GCS.type]
.args("--subscription=input", "--targetDir=output")
.input(PubsubIO[(String, Map[String, String])]("input"), Seq(("Contents", Map[String, String]("doc_timestamp" -> "2001-01-01T09:10:11.332Z"))))
.output(CustomIO[KV[String, WindowedDoc]]("output"))(_.debug())
.run()
and the result is that the value is placed in the -290308-12-21T20:00:00.000Z..-290308-12-21T21:00:00.000Z
window!!. Possibly because the date on "doc_timestamp"
is not properly interpreted. Actually, the window never changes, regardless of the value on "doc_timestamp"
key.
Luckily the job works fine when running in production, but I'd like to have this tests written.