I just started learning pandas and I was wondering if you can put text into the dataframe to get results. I have a text information of about 3000 words with some words repeated many times, "sun" "moon" "earth". I want to produce a graph that shows the number of words occurrence from the most frequent to the least. What aspects of pandas should I concentrate on learning for such task and is pandas best choice for doing it or not?
Asked
Active
Viewed 139 times
-1
-
1Hey Max, welcome to StackOverflow! You see, your question as it is right now is considerably broad, which makes it hard for us to answer. Some suggestions I would give you to improve its quality is doing some more research on the topic and ask here only concise questions. You could also provide us some sample code, even if it's just pseudo-code. I highly recommend you reading the [how do I ask a good question guide](https://stackoverflow.com/help/how-to-ask). So, try editing it and don't take this as an attack, but as a constructive criticism! :) – Pedro Martins de Souza Feb 11 '19 at 15:55
2 Answers
0
If you are just trying to show frequency of words you can use the following:
df['column_with_words'].hist()
But that probably won't give you what you want. You are better off researching some type of textual analysis package like nltk
.

Polkaguy6000
- 1,150
- 1
- 8
- 15
0
Agree with Max's comment that your question is too broad. Howe ever, what you want to do is tokenizing
text and count the frequency of each token. That can be done similar to this question. Here is one implementation:
import nltk
with open ("input.txt", "r") as myfile:
data=myfile.read().replace('\n', ' ')
data = data.split(' ')
fdist1 = nltk.FreqDist(data)
print(fdist1)

Amir Imani
- 3,118
- 2
- 22
- 24