Java - Spark SQL DataFrame map function is not working

Question

In Spark SQL when I tried to use map function on DataFrame then I am getting below error.

The method map(Function1, ClassTag) in the type DataFrame is not applicable for the arguments (new Function(){})

I am following spark 1.3 documentation as well. https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection Have any one solution?

Here is my testing code.

   // SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

List<String> teenagerNames = teenagers.map(
            new Function<Row, String>() {
          public String call(Row row) {
            return "Name: " + row.getString(0);
          }
        }).collect();

could you please provide your full code? btw which version os SPARK are you using? (>1.3.0) — Fabio Fantoni, Apr 22 '15 at 08:46

score 12 · Answer 1 · answered May 05 '15 at 17:44

12

Change this to:

Java 6 & 7

List<String> teenagerNames = teenagers.javaRDD().map(
    new Function<Row, String>() {
    public String call(Row row) {
        return "Name: " + row.getString(0);
    }
}).collect();

Java 8

List<String> t2 = teenagers.javaRDD().map(
    row -> "Name: " + row.getString(0)
).collect();

Once you call javaRDD() it works just like any other RDD map function.

This works with Spark 1.3.0 and up.

answered May 05 '15 at 17:44

econn

121
2

1

What happens if you transform it in an RDD? Is the transform lazy? Is the memory moved into a new structure? Can the execution still be optimize? – GiCo May 14 '16 at 12:09

score 2 · Answer 2 · answered Dec 23 '17 at 03:55

No need to convert to RDD, its delays the execution it can be done as below

`public static void mapMethod() { // Read the data from file, where the file is in the classpath. Dataset df = sparkSession.read().json("file1.json");

// Prior to java 1.8 
Encoder<String> encoder = Encoders.STRING();
    List<String> rowsList = df.map((new MapFunction<Row, String>() {
        private static final long serialVersionUID = 1L;

        @Override
        public String call(Row row) throws Exception {
            return "string:>" + row.getString(0).toString() + "<";
        }
    }), encoder).collectAsList();

// from java 1.8 onwards
List<String> rowsList1 = df.map((row -> "string >" + row.getString(0) + "<" ), encoder).collectAsList();
System.out.println(">>> " + rowsList);
System.out.println(">>> " + rowsList1);

}`

Which spark version are you using for this? – Amu Feb 05 '18 at 21:09 — Amu, Feb 05 '18 at 21:09

score 0 · Answer 3 · answered Apr 22 '15 at 13:38

0

Do you have the correct dependency set in your pom. Set this and try

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>

answered Apr 22 '15 at 13:38

urug

405
7
18

I am using below dependencies with Java 1.7 [code] org.apache.spark spark-core_2.11 1.3.1 org.apache.spark spark-sql_2.11 1.3.1 [/code] – user3206330 Apr 23 '15 at 09:54
Documentation says you can run all the normal functions of JavaRDD against DataFrames but that does not appear to be the case here. I was able to reproduce your problem. map() method of DataFrame class expects 2 arguments. May be explicitly convert the Dataframe to RDD as teenagers.javaRDD() then apply the map. – urug Apr 23 '15 at 20:36

score 0 · Answer 4 · answered Jun 04 '15 at 10:01

try this:

// SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

List<String> teenagerNames = teenagers.toJavaRDD().map(
        new Function<Row, String>() {
      public String call(Row row) {
        return "Name: " + row.getString(0);
      }
    }).collect();

you have to transforme your DataFrame to javaRDD

score 0 · Answer 5 · answered Feb 04 '16 at 11:02

0

check if you are using the correct import for

Row(import org.apache.spark.sql.Row) Remove any other imports related to Row.otherwise ur syntax is correct

answered Feb 04 '16 at 11:02

Swaminathan S

211
3
11

score 0 · Answer 6 · answered Apr 12 '16 at 06:45

0

Please check your input file's data and your dataframe sql query same thing I am facing and when I look back to the data so it was not matching with my query. So probably same issue you are facing. toJavaRDD and JavaRDD both are working.

answered Apr 12 '16 at 06:45

ankitbeohar90

109
13

Java - Spark SQL DataFrame map function is not working

6 Answers6

Linked