0

Not sure about how this GenerateSequence work for me as i have to read values from Mongo periodically on hourly or on daily basis, created a ParDo that reads the MongoDB, also added window into GlobalWindows with an trigger (trigger i will update as pr requirement). But below code snippet giving return type error so could you please help me to correct below lines of code? Also find snapshot of the error. Also how this Generate Sequence help in my case ?

enter image description here

PCollectionView<List<String>> list_of_vins = pipeline
                  .apply(GenerateSequence.from(0).withRate(1, Duration.standardMinutes(5))) // adjust polling rate
                  .apply(ParDo.of(new DoFn<Long, List<String>>() {
                    @ProcessElement
                    public void process(ProcessContext c) {
                      // Read entire DB, and output as a List<String>
                        final String uriString = "mongodb://$[username]:$[password]@$[hostlist]/$[database]?authSource=$[authSource]";
                        MongoClient mongoClient = MongoClients.create(uriString);
                        MongoDatabase mongoDB = mongoClient.getDatabase(options.getMongoDBHostName());
                        MongoCollection<Document> mongoCollection = mongoDB.getCollection(options.getMongoVinCollectionName());
                        c.output((List<String>) ((View) mongoCollection).asList());
                    }
                  })
                  .apply(Window.into(new GlobalWindows()).triggering(AfterPane.elementCountAtLeast(1))));

2 Answers2

0

You'll need to specify the types on the Window transform like this:

.apply(Window.<List<String>>into(...));
danielm
  • 3,000
  • 10
  • 15
  • i am able to read data but the size of sideinput getting in ParDo is 5883 which should be 20,000. Why i am getting less counts as others logs showing correct count. But in pardo i am not getting exact counts. Is there anything wrong with the Window ? Or anything different i am doing ? Please have look at the above comment for my code. Thanks ! – deepalneema Jun 12 '20 at 16:55
0

@danielm and all,

I have updated my code and seems its working but few questions and required clarification to go ahead with this,

PCollection<String> list_of_vins_1 = pipeline
            // Generate a tick every 15 seconds
            .apply("Ticker", GenerateSequence.from(0).withRate(1, Duration.standardMinutes(2)))
            // Just to check if individual ticks are being generated once every day
            .apply("Read Data from Mongo DB",ParDo.of(new DoFn<Long, Document>() {
                    @ProcessElement
                    public void processElement(@Element Long tick, OutputReceiver<Document> out) {
                            // reading values from Mongo DB
                            out.output(mongoDocuments);
                        }
                    }
                }
            )).apply("Window", Window.<Document>into(new GlobalWindows()).triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1))).discardingFiredPanes())
            .apply(ParDo.of(new ConvertDocuemntToStringFn()));

// convert to mongodata to list of string
PCollectionView<List<String>> list_of_data_1 = list_of_vins_1.apply(View.<String> asList());

I am able to read value from mongo db as per Ticker Duration mentioned but i am not sure this will increase my sideinput size. Like as i am passing this list_of_data_1 as a sideinput, in pipeline its shows that counts of elements added in increase.

Lets suppose if mongo db has 20000 collections and if this ticker runs every 2 mins then number of elements added will be 20000 multiply by number of times ticker runs i.e 20,000 + 20,0000 + 20,000 + ..... and so on.

So my question is Is every time elements got added in Side inputs or sideinput is refreshing and sideinput always has 20,000 values or whatever MongoDB has, is it appending or overriding ?

  • @danielm i am able to read data but the size of sideinput getting in ParDo is 5883 which should be 20,000. Why i am getting less counts as others logs showing correct count. But in pardo i am not getting exact counts. Is there anything wrong with the Window ? Or anything different i am doing ? Please have look at the above comment for my code. Thanks ! – deepalneema Jun 12 '20 at 16:56
  • Each time the trigger fires, it will override the previous value. If you want to append, specify .accumulatingFiredPanes() on your triggering – danielm Jun 15 '20 at 17:21