I am trying to run the line of code:
pd.get_dummies(pd_df, columns = ['ethnicity'])
However, I keep getting the error 'DataFrame' object has no attribute '_internal'
. It looks like its linked to the ...pyspark/pandas/namespace.py
file so therefore I am not too sure how to fix it.
Unfortunately, the dataframe itself is private so I can't show/describe it on Stackoverflow however any information about why this could be happening would be greatly appreciated!
I can make the example below work perfectly but it wont work on my code even though it is exactly the same I just have a different DataFrame that has been changed from PySpark to Pandas:
sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
,"sales":[50000,52000,90000,34000,42000,72000,49000,55000,67000,65000,67000]
,"region":["East","North","East","South","West","West","South","West","West","East",np.nan]
}
)
pd.get_dummies(sales_data, columns = ['region'])