3

I have worked with RDD.flatMap function in java. Now trying my hands on DataFrames.

They say:

public <R> RDD<R> flatMap(scala.Function1<org.apache.spark.sql.Row,
    scala.collection.TraversableOnce<R>> f, scala.reflect.ClassTag<R> evidence$4)

Returns a new RDD by first applying a function to all rows of this DataFrame, and then flattening the results.

Specified by: flatMap in interface RDDApi

But when I tried this, Function1, is forcing me to override lots and lots of unimplemented methods. This is what I get:

    RDD<Row> res = df.flatMap(new Function1<Row, TraversableOnce<Row>>() {

        @Override
        public <A> Function1<Row, A> andThen(
                Function1<TraversableOnce<Row>, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcDD$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcDF$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcDI$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcDJ$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcFD$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcFF$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcFI$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcFJ$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcID$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcIF$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcII$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcIJ$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcJD$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcJF$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcJI$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcJJ$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcVD$sp(
                Function1<BoxedUnit, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcVF$sp(
                Function1<BoxedUnit, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcVI$sp(
                Function1<BoxedUnit, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcVJ$sp(
                Function1<BoxedUnit, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcZD$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcZF$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcZI$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<Object, A> andThen$mcZJ$sp(
                Function1<Object, A> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public TraversableOnce<Row> apply(Row arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public double apply$mcDD$sp(double arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public double apply$mcDF$sp(float arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public double apply$mcDI$sp(int arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public double apply$mcDJ$sp(long arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public float apply$mcFD$sp(double arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public float apply$mcFF$sp(float arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public float apply$mcFI$sp(int arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public float apply$mcFJ$sp(long arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public int apply$mcID$sp(double arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public int apply$mcIF$sp(float arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public int apply$mcII$sp(int arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public int apply$mcIJ$sp(long arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public long apply$mcJD$sp(double arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public long apply$mcJF$sp(float arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public long apply$mcJI$sp(int arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public long apply$mcJJ$sp(long arg0) {
            // TODO Auto-generated method stub
            return 0;
        }

        @Override
        public void apply$mcVD$sp(double arg0) {
            // TODO Auto-generated method stub

        }

        @Override
        public void apply$mcVF$sp(float arg0) {
            // TODO Auto-generated method stub

        }

        @Override
        public void apply$mcVI$sp(int arg0) {
            // TODO Auto-generated method stub

        }

        @Override
        public void apply$mcVJ$sp(long arg0) {
            // TODO Auto-generated method stub

        }

        @Override
        public boolean apply$mcZD$sp(double arg0) {
            // TODO Auto-generated method stub
            return false;
        }

        @Override
        public boolean apply$mcZF$sp(float arg0) {
            // TODO Auto-generated method stub
            return false;
        }

        @Override
        public boolean apply$mcZI$sp(int arg0) {
            // TODO Auto-generated method stub
            return false;
        }

        @Override
        public boolean apply$mcZJ$sp(long arg0) {
            // TODO Auto-generated method stub
            return false;
        }

        @Override
        public <A> Function1<A, TraversableOnce<Row>> compose(
                Function1<A, Row> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcDD$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcDF$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcDI$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcDJ$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcFD$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcFF$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcFI$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcFJ$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcID$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcIF$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcII$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcIJ$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcJD$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcJF$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcJI$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcJJ$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, BoxedUnit> compose$mcVD$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, BoxedUnit> compose$mcVF$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, BoxedUnit> compose$mcVI$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, BoxedUnit> compose$mcVJ$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcZD$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcZF$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcZI$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public <A> Function1<A, Object> compose$mcZJ$sp(
                Function1<A, Object> arg0) {
            // TODO Auto-generated method stub
            return null;
        }
    }, evidence$4);

This looks weird, but I went on to make evidence$4 as:

ClassTag<Row> evidence$4 = scala.reflect.ClassTag$.MODULE$.apply(Row.class);

My intention is to just play around with flatMap (of-course on DataFrames not on RDD). So I don't need any changes on Row. Can return the input as is without any changes.

So I guess I need to make change only in apply method.

    @Override
    public TraversableOnce<Row> apply(Row arg0) {
        // TODO Auto-generated method stub
        return null;
    }

But again, how should I get TraversableOnce<Row> from Row?

Also, is the method I am trying correct? Or am I missing something?

I am using Apache Spark 1.3.1

Hleb
  • 7,037
  • 12
  • 58
  • 117
  • First of all, you should use the [`scala.runtime.AbstractFunction1`](http://www.scala-lang.org/api/2.11.5/index.html#scala.runtime.AbstractFunction1) in the function. You can create `TraversableOnce` from Java collections too with the [`scala.collection.JavaConverters$`](http://www.scala-lang.org/api/2.11.5/index.html#scala.collection.JavaConverters$). – Gábor Bakos May 25 '15 at 14:36
  • Please, remove "spark-java" tag, it's not related to Apache Spark. – Hleb Jun 03 '15 at 13:26

1 Answers1

1

You should do something like the following:

RDD<Row> res = df.flatMap(new AbstractFunction1<Row, TraversableOnce<Row>>() {
  public TraversableOnce<Row> apply(Row row) {
    return new ListSet<Row>().$plus(row); //Note the updated list is returned from $plus()
  }
}, evidence$4);

This would work similarly to map, just with more freedom to change. For example to filter out things, you could return the empty new ListSet<Row>() when you want to return it, or keep with the current behaviour. flatMap is very flexible.

(It seems the conversion from Java collections is not trivial to Scala collections.)

Gábor Bakos
  • 8,982
  • 52
  • 35
  • 52
  • In my mind, DataFrame = RDD + Schema. If that is the case, essentially what we should be doing is newDF = DF.RDD.flatmap(f).applySchema(), without getting to much trouble of reinventing the wheel. – ayan guha May 26 '15 at 03:48
  • Is that it?? Isn't there any change under the hood? Will the performance be just the same? – Gireesh Puthumana May 26 '15 at 06:37
  • @gábor-bakos, tried exactly same code from your answer. It shows compilation error and asks me to "Add cast to 'TraversableOnce' ". Tried that, but getting exception "java.lang.ClassCastException: scala.collection.convert.Decorators$AsScala cannot be cast to scala.collection.TraversableOnce". – Gireesh Puthumana May 26 '15 at 07:38
  • Let's add a `toList()` to that in that case. – Gábor Bakos May 26 '15 at 08:22
  • Where do I add `toList()`? (Sorry but this scala - java thing is confusing). Can you please post the updated code? Also what is "evidence$4"? What purpose it serves? Is the way I made it up correct? Is there a good documentation of Spark DF java apis with examples? – Gireesh Puthumana May 26 '15 at 08:39
  • Just noticed that you had edited your answer. But this `JavaConverters$.MODULE$.asScalaBufferConverter(Collections.singletonList(row)).toList();` gives _"The method toList() is undefined for the type Decorators.AsScala>"_ I am using spark 1.3.1 – Gireesh Puthumana May 26 '15 at 08:42
  • Sorry, these are beyond my current knowledge from using Scala collections from Java. The `evidence$4` could be named differently, I just tried to add what you suggested, in Scala that is automatically populated. Maybe putting `List.apply` around the list: `List$.MODULE$.apply(JavaConverters$.MODULE$.asScalaBufferConverter(Collections.singletonList(row)))` would fix the error. Or `List$.MODULE$.apply(new Row[]{row})` might also work. – Gábor Bakos May 26 '15 at 08:58
  • Both didn't work. But this did: `ListSet b = new ListSet(); b.$plus(row); return b;`. But I am getting 0 records in output. My doubt is on the `evidence$4`. Any help? – Gireesh Puthumana May 26 '15 at 10:39
  • Yeah, I think `$plus` returns the collection containing the row, so this might work: `ListSet b = new ListSet().$plus(row); return b;`. – Gábor Bakos May 26 '15 at 10:45
  • Ooops! Thats it. `ListSet b = new ListSet().$plus(row);` is giving output. problem was that separate `b.$plus(row);` which was not assigned back to b. Please modify your answer. I will accept it. Thank you! – Gireesh Puthumana May 26 '15 at 13:55
  • I have updated the answer, this was educational for me too. Thank you. – Gábor Bakos May 26 '15 at 14:20