2

I'm joining three data frames and all it's ok, but when I call to "display" method at the final data frame (joining three previous dataframe) databricks return this error:

java.lang.AssertionError: assertion failed

I'm using:

%fs head dbfs:/databricks-datasets/iot-stream/data-user/

%fs ls dbfs:/databricks-datasets/iot-stream/data-user/

Could someone help me? Thanks!

These are the data frames schemas:

df_MaximasCalorias

  • ID: long (nullable = true)

  • Max_Calorias: double (nullable = true)

df_MinCalorias

  • user_id: long (nullable = true)

  • Min_Calorias: double (nullable = true)

df_MediaCalorias

  • user_id: long (nullable = true)

  • Media_Calorias: double (nullable = true)

Dataframe = dfCalorias (join df_MaximasCalorias and df_MinCalorias)

  • ID: long (nullable = true)

  • Max_Calorias: double (nullable = true)

  • Min_Calorias: double (nullable = true)

Dataframe = dfCaloriasFinal (join dfCalorias and df_MediaCalorias)

  • ID: long (nullable = true)

  • Max_Calorias: double (nullable = true)

  • Min_Calorias: double (nullable = true)

  • Media_Calorias: double (nullable = true)

And this is the complete code:

Change columns names

df_MaximasCalorias = df_MaximasCalorias.withColumnRenamed("user_id","ID").withColumnRenamed("max(calories_burnt)","Max_Calorias") 

df_MinimasCalorias = df_MinimasCalorias.withColumnRenamed("min(calories_burnt)","Min_Calorias") df_MediaCalorias = df_MediaCalorias.withColumnRenamed("avg(calories_burnt)","Media_Calorias") 

Create join expression

joinExpression = df_MaximasCalorias["ID"] == df_MinimasCalorias['user_id'] 

First join

dfCalorias = df_MaximasCalorias.join(df_MinimasCalorias, joinExpression, "inner").select("ID","Max_Calorias","Min_Calorias") dfCalorias.show()

Show Data. Works perfect

display(dfCalorias) 

Now join the new data frame dfCalorias with df_MediaCalorias

joinExpression = dfCalorias["ID"] == df_MediaCalorias['user_id'] 



dfCaloriasFinal = dfCalorias.join(df_MediaCalorias, joinExpression, "inner").select("ID","Max_Calorias","Min_Calorias","Media_Calorias")

Error at this code

display(dfCaloriasFinal)
Saeed Zhiany
  • 2,051
  • 9
  • 30
  • 41
Danny
  • 41
  • 5

1 Answers1

1

I appreciate the detailed question! I'm pretty sure your error is this statement

joinExpression = dfCalorias["ID"] == df_MediaCalorias['user_id'] 

which sets joinExpression to a boolean value, since you're setting it equal to a comparison. You're better off writing your join equation in the function itself

dfCaloriasFinal = dfCalorias.join(df_MediaCalorias, dfCalorias["ID"] == df_MediaCalorias['user_id'], "inner").select("ID","Max_Calorias","Min_Calorias","Media_Calorias")
SkippyNBS
  • 687
  • 1
  • 5
  • 21