I am new to data bricks and working on pyspark dataframe. In my code, I have join the two dataframe by using join function and then I use the count function to get the count of new dataframe. Then I sort the dataframe by using orderby function and again use count function to get the count but this time count is diffent. Also, every time I run the code both count is never the same and return a different value in every run. Code is something like this
newDf=df1.join(df2, df1.col1=df2.col2, 'inner')
newDF.count()
newDF=newDF.orderBy('col1')
newDF.count()