What I have:
I have a DataFrame (df
) with 2 columns.
In df["Words"]
I have some Persian\Farsi words.
Words | Counts |
---|---|
سلام | |
کشور زیبا ؟ | |
28 % ایران | |
ایران طلا | |
طلا ایران | |
سلام ایران |
What I would:
I would separate the words and count the frequency of every single word in column "Words":
Words | Counts |
---|---|
سلام | 2 |
کشور | 1 |
زیبا | 1 |
؟ | 1 |
ایران | 4 |
طلا | 2 |
% | 1 |
What I did:
df.Words.str.get_dummies(sep=' ').mul(df['count'], axis=0).sum()
What I received from python :
Words | Counts |
---|---|
سلام | NAN |
کشور | NAN |
زیبا | NAN |
؟ | NAN |
ایران | NAN |
طلا | NAN |
% | NAN |
The problem is the formatting or the code?