I'm joining three data frames and all it's ok, but when I call to "display" method at the final data frame (joining three previous dataframe) databricks return this error:
java.lang.AssertionError: assertion failed
I'm using:
%fs head dbfs:/databricks-datasets/iot-stream/data-user/
%fs ls dbfs:/databricks-datasets/iot-stream/data-user/
Could someone help me? Thanks!
These are the data frames schemas:
df_MaximasCalorias
ID: long (nullable = true)
Max_Calorias: double (nullable = true)
df_MinCalorias
user_id: long (nullable = true)
Min_Calorias: double (nullable = true)
df_MediaCalorias
user_id: long (nullable = true)
Media_Calorias: double (nullable = true)
Dataframe = dfCalorias (join df_MaximasCalorias and df_MinCalorias)
ID: long (nullable = true)
Max_Calorias: double (nullable = true)
Min_Calorias: double (nullable = true)
Dataframe = dfCaloriasFinal (join dfCalorias and df_MediaCalorias)
ID: long (nullable = true)
Max_Calorias: double (nullable = true)
Min_Calorias: double (nullable = true)
- Media_Calorias: double (nullable = true)
And this is the complete code:
Change columns names
df_MaximasCalorias = df_MaximasCalorias.withColumnRenamed("user_id","ID").withColumnRenamed("max(calories_burnt)","Max_Calorias")
df_MinimasCalorias = df_MinimasCalorias.withColumnRenamed("min(calories_burnt)","Min_Calorias") df_MediaCalorias = df_MediaCalorias.withColumnRenamed("avg(calories_burnt)","Media_Calorias")
Create join expression
joinExpression = df_MaximasCalorias["ID"] == df_MinimasCalorias['user_id']
First join
dfCalorias = df_MaximasCalorias.join(df_MinimasCalorias, joinExpression, "inner").select("ID","Max_Calorias","Min_Calorias") dfCalorias.show()
Show Data. Works perfect
display(dfCalorias)
Now join the new data frame dfCalorias
with df_MediaCalorias
joinExpression = dfCalorias["ID"] == df_MediaCalorias['user_id']
dfCaloriasFinal = dfCalorias.join(df_MediaCalorias, joinExpression, "inner").select("ID","Max_Calorias","Min_Calorias","Media_Calorias")
Error at this code
display(dfCaloriasFinal)