0

I am currently using Apache Beam's scala wrapper library, scio. The thing you want to do is combine different types of messages sent from CloudPubSub based on ID.

The message A is sent every second, and the message B is sent once every three seconds. When I get a message B, I'd like to combine messages with the same ID from message A that I received.

message example)

A=37
A=38
A=39
B=39
A=40
A=41
A=42
B=42
A=43
A=44

current code

val AInput = sc.pubsubSubscription[String]("projects/hoge/subscriptions/A")`
  .withFixedWindows(Duration.standardSeconds(10))
  .keyBy(a => {
    a.split("=")(1).toInt
  })

val BInput = sc.pubsubSubscription[String]("projects/hoge/subscriptions/B")
  .withFixedWindows(Duration.standardSeconds(10))
  .keyBy(a => {
    println(a.split("=")(1))
    a.split("=")(1).toInt
  })
  .toWindowed
  .map(s => {
    println(s.value.toString)
    println(s.window.maxTimestamp().toDateTime.toString("yyyy/MM/dd HH:mm:ss ZZ"))
    s
  })
  .toSCollection
  .join(AInput)
  .map(a  => {
    println("---------------")
    println(a._1)
    println(a._2._1)
    println(a._2._2)
  })

Both lines exec to the line of keyBy. However, print after join will not print anything. There is no error etc...

in trouble. I am waiting for an answer...

(console log)

9
3
12
(3,B=3)
(9,B=9)
2017/07/17 16:30:09 +09:00
2017/07/17 16:28:39 +09:00
(12,B=12)
2017/07/17 16:29:09 +09:00
6
9
15
12
(6,B=6)
(9,B=9)
2017/07/17 16:30:19 +09:00
2017/07/17 16:30:09 +09:00
(12,B=12)
2017/07/17 16:30:19 +09:00
(15,B=15)
2017/07/17 16:30:19 +09:00
21
24
27
18
30
(21,B=21)
2017/07/17 16:30:29 +09:00
(24,B=24)
2017/07/17 16:30:39 +09:00
(27,B=27)
2017/07/17 16:30:39 +09:00
(18,B=18)
2017/07/17 16:30:29 +09:00
(30,B=30)
2017/07/17 16:30:39 +09:00
33
36
42
(33,B=33)
2017/07/17 16:30:49 +09:00
39
(42,B=42)
2017/07/17 16:30:59 +09:00
(36,B=36)
2017/07/17 16:30:49 +09:00
(39,B=39)
2017/07/17 16:30:59 +09:00
45

Window processing seems to be done every 10 seconds, but the time to be processed falls apart. In addition, I discovered that if I launch it with DataflowRunner instead of DirectRunner it will succeed.

SakaT
  • 1
  • 2
  • I see you're using println - so I suppose you're running this using Direct runner on your local machine? – jkff Jul 14 '17 at 22:22
  • Yes.I was ran this code by Direct runner – SakaT Jul 15 '17 at 15:56
  • Got it. A couple more things to debug: what version of Beam SDK are you using? Can you try printing the timestamps of the elements you're getting from the subscriptions, not just the values? How long did you wait for output to appear before giving up? – jkff Jul 15 '17 at 16:58
  • Thank you for your reply. The beam sdk version is 0.6.0. This is a dependent library of scio 0.3.3. The timestamp was added by editing the question sentence. It seems that window processing is done every 10 seconds. I have been waiting for about 5 minutes since I first published the first message. Furthermore, I found that this code works well if I run it with DataFlowRunner instead of DiretRunner. – SakaT Jul 17 '17 at 07:44
  • Hmm. Take a look at this thread that seemed to have a similar issue? https://lists.apache.org/thread.html/91801903f4db36765724f3f888d9b1598b7cfaf2b1190f578b5f064d@%3Cuser.beam.apache.org%3E – jkff Jul 17 '17 at 23:38
  • I looked this...but I don't know why not running join method when using Direct Runner. – SakaT Jul 18 '17 at 03:04
  • Have you tried the solution proposed by Thomas Groh in that thread? – jkff Jul 18 '17 at 19:37
  • Yes.but can't run in using DirectRunnner.but this solution is runnning when using DataflowRunnner. – SakaT Jul 27 '17 at 02:38
  • If you're still interested in making this work in direct runner, could you update your question with the result of applying Thomas' solution to your code, and the output it gives? – jkff Jul 27 '17 at 03:47

0 Answers0