How to join on binary field?

Question

In Scala/Spark, I am trying to do the following:

val portCalls_Ports = 
  portCalls.join(ports, portCalls("port_id") === ports("id"), "inner")

However I am getting the following error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: 
     binary type expression port_id cannot be used in join conditions;

It's true that this is a binary type:

root
 |-- id: binary (nullable = false)
 |-- port_id: binary (nullable = false)
     .
     .
     .

+--------------------+--------------------+
|                  id|             port_id|
+--------------------+--------------------+
|[FB 89 A0 FF AA 0...|[B2 B2 84 B9 52 2...|

as is ports("id").

I am using the following libraries:

scalaVersion := "2.11.11"
libraryDependencies ++= Seq(
  // Spark dependencies
  "org.apache.spark" %% "spark-hive" % "1.6.2",
  "org.apache.spark" %% "spark-mllib" % "1.6.2",
  // Third-party libraries
  "postgresql" % "postgresql" % "9.1-901-1.jdbc4",
  "net.sf.jopt-simple" % "jopt-simple" % "5.0.3"
)

Note that I am using JDBC to read database tables.

What is the best way to fix this problem?

Binary type expressions can be used in join conditions in Spark 2.1.0, but not in versions before it. — suj1th, Jun 09 '17 at 14:47
I removed the jdbc tag, because this problem seems to be purely inside spark, and is not related to using jdbc. — Mark Rotteveel, Jun 09 '17 at 15:12

Tzach Zohar · Accepted Answer · 2017-06-09T16:04:33.913

4

Pre Spark 2.1.0, the best workaround I know of is using the base64 function to convert the binary columns into Strings, and compare these:

import org.apache.spark.sql.functions._

val portCalls_Ports =
  portCalls.join(ports, base64(portCalls("port_id")) === base64(ports("id")), "inner")

edited Jun 09 '17 at 16:04

answered Jun 09 '17 at 14:47

Tzach Zohar

37,442
3
79
85

Sorry - edited the post to include the import; I recommend making a habit of adding this import to every DataFrame-related piece of code ;) – Tzach Zohar Jun 09 '17 at 16:05

How to join on binary field?

1 Answers1