Is there an alternative for GroupReduceFunction running apache-flink java in parallel?

Question

The code below is running locally but not on the cluster. It hangs on GroupReduceFunction and do not terminates even after hours (it takes for large data ~ 9 minutes to compute locally). The last message in the log:

GroupReduce (GroupReduce at main(MyClass.java:80)) (1/1) (...) switched from DEPLOYING to RUNNING.

The code fragment:

DataSet<MyData1> myData1 = env.createInput(new UserDefinedFunctions.MyData1Set());
DataSet<MyData2> myData2 = DataSetUtils.sampleWithSize(myData1, false, 8, Long.MAX_VALUE)
                .reduceGroup(new GroupReduceFunction<MyData1, MyData2>() {
                    @Override
                    public void reduce(Iterable<MyData1> itrbl, Collector<MyData2> clctr) throws Exception {
                        int id = 0;
                        for (MyData1 myData1 : itrbl) {
                            clctr.collect(new MyData2(id++, myData1));
                        }
                    }
                });

Any ideas how I could run this segment in parallel? Thanks in advance!

Hi @anno2019. Can you please show us the code of `UserDefinedFunctions.MyData2Function()`? Can you furthermore please tell us what exactly the problem is? Do you get an exception or does just nothing happens? — TobiSH, Oct 20 '19 at 19:06
Hmm, that looks strange. Did you try to send the progress to a metric system? If you don't have anything in place just use log output to see if any data gets processed. — TobiSH, Oct 20 '19 at 20:36

Is there an alternative for GroupReduceFunction running apache-flink java in parallel?

0 Answers0