0

The code below is running locally but not on the cluster. It hangs on GroupReduceFunction and do not terminates even after hours (it takes for large data ~ 9 minutes to compute locally). The last message in the log:

GroupReduce (GroupReduce at main(MyClass.java:80)) (1/1) (...) switched from DEPLOYING to RUNNING.

The code fragment:

DataSet<MyData1> myData1 = env.createInput(new UserDefinedFunctions.MyData1Set());
DataSet<MyData2> myData2 = DataSetUtils.sampleWithSize(myData1, false, 8, Long.MAX_VALUE)
                .reduceGroup(new GroupReduceFunction<MyData1, MyData2>() {
                    @Override
                    public void reduce(Iterable<MyData1> itrbl, Collector<MyData2> clctr) throws Exception {
                        int id = 0;
                        for (MyData1 myData1 : itrbl) {
                            clctr.collect(new MyData2(id++, myData1));
                        }
                    }
                });

Any ideas how I could run this segment in parallel? Thanks in advance!

anno2019
  • 1
  • 1
  • Hi @anno2019. Can you please show us the code of `UserDefinedFunctions.MyData2Function()`? Can you furthermore please tell us what exactly the problem is? Do you get an exception or does just nothing happens? – TobiSH Oct 20 '19 at 19:06
  • Hmm, that looks strange. Did you try to send the progress to a metric system? If you don't have anything in place just use log output to see if any data gets processed. – TobiSH Oct 20 '19 at 20:36
  • We tried with logs, but no output is emitted. – anno2019 Oct 20 '19 at 21:03

0 Answers0