0

DataSet person = spark.read.textfile(path).map(Person::new,Encoders.bean(Person.class))

when i tried above it will works in spark2.4(scala-2.11) but in spark3.1.1(scala-2.12) it's shows as ambigous for the type DataSet. And also wherver i use map,filter,mappartitions,flatmap got ambigous/

but by doing type casting will get success for all transformations spark.read.textfile(path).map((MapFunction<String,Person>)Person::new,Encoders.bean(Person.class))

Is there any other way without typecasting/codechanges

Anuradha
  • 1
  • 1

1 Answers1

0

From Migration guide, chapter 'SQL, Datasets and DataFrame':

In Spark 3.0, Dataset query fails if it contains ambiguous column reference that is caused by self join. A typical example: val df1 = ...; val df2 = df1.filter(...);, then df1.join(df2, df1("a") > df2("a")) returns an empty result which is quite confusing. This is because Spark cannot resolve Dataset column references that point to tables being self joined, and df1("a") is exactly the same as df2("a") in Spark. To restore the behavior before Spark 3.0, you can set spark.sql.analyzer.failAmbiguousSelfJoin to false.

Source https://spark.apache.org/docs/latest/sql-migration-guide.html

pburgr
  • 1,722
  • 1
  • 11
  • 26