5

In Spark SQL when I tried to use map function on DataFrame then I am getting below error.

The method map(Function1, ClassTag) in the type DataFrame is not applicable for the arguments (new Function(){})

I am following spark 1.3 documentation as well. https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection Have any one solution?

Here is my testing code.

   // SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

List<String> teenagerNames = teenagers.map(
            new Function<Row, String>() {
          public String call(Row row) {
            return "Name: " + row.getString(0);
          }
        }).collect();
Will Ness
  • 70,110
  • 9
  • 98
  • 181
user3206330
  • 51
  • 1
  • 1
  • 2

6 Answers6

12

Change this to:

Java 6 & 7

List<String> teenagerNames = teenagers.javaRDD().map(
    new Function<Row, String>() {
    public String call(Row row) {
        return "Name: " + row.getString(0);
    }
}).collect();

Java 8

List<String> t2 = teenagers.javaRDD().map(
    row -> "Name: " + row.getString(0)
).collect();

Once you call javaRDD() it works just like any other RDD map function.

This works with Spark 1.3.0 and up.

econn
  • 121
  • 2
  • 1
    What happens if you transform it in an RDD? Is the transform lazy? Is the memory moved into a new structure? Can the execution still be optimize? – GiCo May 14 '16 at 12:09
2

No need to convert to RDD, its delays the execution it can be done as below

`public static void mapMethod() { // Read the data from file, where the file is in the classpath. Dataset df = sparkSession.read().json("file1.json");

// Prior to java 1.8 
Encoder<String> encoder = Encoders.STRING();
    List<String> rowsList = df.map((new MapFunction<Row, String>() {
        private static final long serialVersionUID = 1L;

        @Override
        public String call(Row row) throws Exception {
            return "string:>" + row.getString(0).toString() + "<";
        }
    }), encoder).collectAsList();

// from java 1.8 onwards
List<String> rowsList1 = df.map((row -> "string >" + row.getString(0) + "<" ), encoder).collectAsList();
System.out.println(">>> " + rowsList);
System.out.println(">>> " + rowsList1);

}`

0

Do you have the correct dependency set in your pom. Set this and try

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
urug
  • 405
  • 7
  • 18
  • I am using below dependencies with Java 1.7 [code] org.apache.spark spark-core_2.11 1.3.1 org.apache.spark spark-sql_2.11 1.3.1 [/code] – user3206330 Apr 23 '15 at 09:54
  • Documentation says you can run all the normal functions of JavaRDD against DataFrames but that does not appear to be the case here. I was able to reproduce your problem. map() method of DataFrame class expects 2 arguments. May be explicitly convert the Dataframe to RDD as teenagers.javaRDD() then apply the map. – urug Apr 23 '15 at 20:36
0

try this:

// SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

List<String> teenagerNames = teenagers.toJavaRDD().map(
        new Function<Row, String>() {
      public String call(Row row) {
        return "Name: " + row.getString(0);
      }
    }).collect();

you have to transforme your DataFrame to javaRDD

0

check if you are using the correct import for

Row(import org.apache.spark.sql.Row) Remove any other imports related to Row.otherwise ur syntax is correct

Swaminathan S
  • 211
  • 3
  • 11
0

Please check your input file's data and your dataframe sql query same thing I am facing and when I look back to the data so it was not matching with my query. So probably same issue you are facing. toJavaRDD and JavaRDD both are working.

ankitbeohar90
  • 109
  • 13