-3

I have a data set with around 4000 client questions. I want to know about the topics which the client has asked the most about. I don't have the topic list with me. I want to get a word count of ever word in the column.

The data is in a pandas dataframe.

Ironman10
  • 237
  • 1
  • 3
  • 10
  • 1
    Can you add some sample data, expected output and what you try? All is missing and it is reason for downvotes. – jezrael Mar 09 '18 at 09:23

1 Answers1

8

Use split by whitespace and expand=True for DataFrame, reshape by stack and get sorted counts by value_counts:

df = pd.DataFrame({'a':['aa ss d','f d aa aa','aa']})
print (df)
           a
0    aa ss d
1  f d aa aa
2         aa

s = df['a'].str.split(expand=True).stack().value_counts()
print (s)
aa    4
d     2
f     1
ss    1
dtype: int64

For DataFrame:

df1 = (df['a'].str.split(expand=True)
              .stack()
              .value_counts()
              .rename_axis('vals')
              .reset_index(name='count'))
print (df1)
  vals  count
0   aa      4
1    d      2
2    f      1
3   ss      1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252