I am currently using Apache Beam's scala wrapper library, scio. The thing you want to do is combine different types of messages sent from CloudPubSub based on ID.
The message A is sent every second, and the message B is sent once every three seconds. When I get a message B, I'd like to combine messages with the same ID from message A that I received.
message example)
A=37
A=38
A=39
B=39
A=40
A=41
A=42
B=42
A=43
A=44
current code
val AInput = sc.pubsubSubscription[String]("projects/hoge/subscriptions/A")`
.withFixedWindows(Duration.standardSeconds(10))
.keyBy(a => {
a.split("=")(1).toInt
})
val BInput = sc.pubsubSubscription[String]("projects/hoge/subscriptions/B")
.withFixedWindows(Duration.standardSeconds(10))
.keyBy(a => {
println(a.split("=")(1))
a.split("=")(1).toInt
})
.toWindowed
.map(s => {
println(s.value.toString)
println(s.window.maxTimestamp().toDateTime.toString("yyyy/MM/dd HH:mm:ss ZZ"))
s
})
.toSCollection
.join(AInput)
.map(a => {
println("---------------")
println(a._1)
println(a._2._1)
println(a._2._2)
})
Both lines exec to the line of keyBy. However, print after join will not print anything. There is no error etc...
in trouble. I am waiting for an answer...
(console log)
9
3
12
(3,B=3)
(9,B=9)
2017/07/17 16:30:09 +09:00
2017/07/17 16:28:39 +09:00
(12,B=12)
2017/07/17 16:29:09 +09:00
6
9
15
12
(6,B=6)
(9,B=9)
2017/07/17 16:30:19 +09:00
2017/07/17 16:30:09 +09:00
(12,B=12)
2017/07/17 16:30:19 +09:00
(15,B=15)
2017/07/17 16:30:19 +09:00
21
24
27
18
30
(21,B=21)
2017/07/17 16:30:29 +09:00
(24,B=24)
2017/07/17 16:30:39 +09:00
(27,B=27)
2017/07/17 16:30:39 +09:00
(18,B=18)
2017/07/17 16:30:29 +09:00
(30,B=30)
2017/07/17 16:30:39 +09:00
33
36
42
(33,B=33)
2017/07/17 16:30:49 +09:00
39
(42,B=42)
2017/07/17 16:30:59 +09:00
(36,B=36)
2017/07/17 16:30:49 +09:00
(39,B=39)
2017/07/17 16:30:59 +09:00
45
Window processing seems to be done every 10 seconds, but the time to be processed falls apart. In addition, I discovered that if I launch it with DataflowRunner instead of DirectRunner it will succeed.