'DataFrame' object has no attribute '_internal'

Question

I am trying to run the line of code:

pd.get_dummies(pd_df, columns = ['ethnicity'])

However, I keep getting the error 'DataFrame' object has no attribute '_internal'. It looks like its linked to the ...pyspark/pandas/namespace.py file so therefore I am not too sure how to fix it.

Unfortunately, the dataframe itself is private so I can't show/describe it on Stackoverflow however any information about why this could be happening would be greatly appreciated!

I can make the example below work perfectly but it wont work on my code even though it is exactly the same I just have a different DataFrame that has been changed from PySpark to Pandas:

sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
                           ,"sales":[50000,52000,90000,34000,42000,72000,49000,55000,67000,65000,67000]
                           ,"region":["East","North","East","South","West","West","South","West","West","East",np.nan]
                           }
                          )
pd.get_dummies(sales_data, columns = ['region'])

do you build it from a pyspark dataframe? I'm asking because you seem to say it comes from the file `...pyspark/pandas/namespace.py` and also you talk about `show` that is not in pandas (as far I now). if yes, it may be related to [this Q&A](https://stackoverflow.com/questions/65474079/attributeerror-dataframe-object-has-no-attribute-data) even if it is not strickly the same error — Ben.T, Nov 21 '22 at 21:49
Yes it is a PySpark dataframe which I then use ```.toPandas()```. Thank you I will have a look! — ajnabz, Nov 21 '22 at 21:52
@Ben.T I dont think it is to do with the version as I am able to use it perfectly with the example I have included in the question. Thank you though — ajnabz, Nov 21 '22 at 22:13

score 0 · Answer 1 · answered Feb 07 '23 at 21:33

0

I had this same error. I was confusing the execution by using ps (pyspark.pandas) instead of pd (pandas).

Ensure your alias are correct and you're not accidentally renaming a pandas instantiation:

Ex.

import pyspark.pandas as pd

answered Feb 07 '23 at 21:33

straka86

16
3

'DataFrame' object has no attribute '_internal'

1 Answers1