Count of repeated persian words in columns with python

Question

What I have:

I have a DataFrame (df) with 2 columns.

In df["Words"] I have some Persian\Farsi words.

Words	Counts
سلام
کشور زیبا ؟
28 % ایران
ایران طلا
طلا ایران
سلام ایران

What I would:

I would separate the words and count the frequency of every single word in column "Words":

Words	Counts
سلام	2
کشور	1
زیبا	1
؟	1
ایران	4
طلا	2
%	1

What I did:

df.Words.str.get_dummies(sep=' ').mul(df['count'], axis=0).sum()

What I received from python :

Words	Counts
سلام	NAN
کشور	NAN
زیبا	NAN
؟	NAN
ایران	NAN
طلا	NAN
%	NAN

The problem is the formatting or the code?

Semmel · Answer 1 · 2021-02-02T18:12:40.970

This handles " " and "." (at the end of a sentence). I am not sure if there are any othere separators in farsi. If you need to add them, just add them to the "separators" string.

import pandas as pd
import re

separators = ". "
df = pd.DataFrame({"Words": ["hi you there", "hello all"]})

def get_word_len(words: str) -> int:
   return len(re.split(separators, words))

df["Counts"] = df.Words.apply(get_word_len)

print(df)

Thank you for your feedback. I understood the task a little bit wrong. This should solve your problem. (of course df should be replaced with your dataframe:

import pandas as pd

df = pd.DataFrame({"Words": ["hi you there", "hello all hi"]})

words = list()
for word in df["Words"]:
    words = words + word.split(" ")

df_a = pd.DataFrame({"words": words})
print(df_a["words"].value_counts())

result:

hi       2
there    1
all      1
hello    1
you      1

Unfortunately it's not what I want. I would, that the code: 1- Separate the word. -->2-Count the Words/Symbols --> 3- Show me how many is the count of each Word\Symbol — Jsmoka, Feb 02 '21 at 12:29

Count of repeated persian words in columns with python

What I have:

What I would:

What I did:

What I received from python :

1 Answers1