12
  • DataFrame a = contains column x,y,z,k
  • DataFrame b = contains column x,y,a

    a.join(b,<condition to use in java to use x,y >) ??? 
    

I tried using

a.join(b,a.col("x").equalTo(b.col("x")) && a.col("y").equalTo(b.col("y"),"inner")

But Java is throwing error saying && is not allowed.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Gokul
  • 473
  • 1
  • 4
  • 8

2 Answers2

34

Spark SQL provides a group of methods on Column marked as java_expr_ops which are designed for Java interoperability. It includes and (see also or) method which can be used here:

a.col("x").equalTo(b.col("x")).and(a.col("y").equalTo(b.col("y"))
zero323
  • 322,348
  • 103
  • 959
  • 935
  • 2
    How to make above condition dynamically using java API in case of column number is not fixed. For it could be 2, 4, 3,7 or more.. – V__ Apr 23 '19 at 15:14
1

If you want to use Multiple columns for join, you can do something like this:

a.join(b,scalaSeq, joinType)

You can store your columns in Java-List and convert List to Scala seq. Conversion of Java-List to Scala-Seq:

scalaSeq = JavaConverters.asScalaIteratorConverter(list.iterator()).asScala().toSeq();

Example: a = a.join(b, scalaSeq, "inner");

Note: Dynamic columns will be easily supported in this way.