7

I am trying to join two DataFrame with condition.

I have two dataframe A and B.

A contains id,m_cd and c_cd columns B contains m_cd,c_cd and record columns

Conditions are -

  • If m_cd is null then join c_cd of A with B
  • If m_cd is not null then join m_cd of A with B

we can use "when" and "otherwise()" in withcolumn() method of dataframe, so is there any way to do this for the case of join in dataframe.

I have already done this using Union.But wanted to know if there any other option available.

Avijit
  • 1,770
  • 5
  • 16
  • 34

2 Answers2

11

You can use the "when" / "otherwise" in the join condition:

case class Foo(m_cd: Option[Int], c_cd: Option[Int])
val dfA = spark.createDataset(Array(
    Foo(Some(1), Some(2)),
    Foo(Some(2), Some(3)),
    Foo(None: Option[Int], Some(4))
))


val dfB = spark.createDataset(Array(
    Foo(Some(1), Some(5)),
    Foo(Some(2), Some(6)),
    Foo(Some(10), Some(4))
))

val joinCondition = when($"a.m_cd".isNull, $"a.c_cd"===$"b.c_cd")
    .otherwise($"a.m_cd"===$"b.m_cd")

dfA.as('a).join(dfB.as('b), joinCondition).show

It might still be more readable to use the union, though.

alghimo
  • 2,899
  • 18
  • 11
  • If I want to add multiple condition like If, else if, else of java and scala in this condition (val joinCondition = when($"a.m_cd".isNull, $"a.c_cd"===$"b.c_cd").otherwise($"a.m_cd"===$"b.m_cd")) Is it possible?? – Avijit Sep 12 '16 at 14:32
  • 3
    Hi @Avijit. You can chain multiple "when". "when" can act as the "if" and as the "else if", so you can do "when(condition1, value1).when(cond2, val2).otherwise(default)" – alghimo Sep 12 '16 at 18:09
  • could you share a union based solution? – 219CID Nov 23 '21 at 05:43
0

In case someone is trying to do it in PySpark here's the syntax

join_condition = when(df1.azure_resourcegroup.startswith('a_string'),df1.some_field == df2.somefield)\
    .otherwise((df1.servicename == df2.type) &
    (df1.resourcegroup == df2.esource_group) &
    (df1.subscriptionguid == df2.subscription_id))
df1 = df1.join(df2,join_condition,how='left')
Yousuf Sultan
  • 3,055
  • 8
  • 35
  • 63