Questions tagged [word-frequency]

Word-frequency means analysis of frequencies of different words in a given text corpus or a generalized text.

342 questions
92
votes
19 answers

The Most Efficient Way To Find Top K Frequent Words In A Big Word Sequence

Input: A positive integer K and a big text. The text can actually be viewed as word sequence. So we don't have to worry about how to break down it into word sequence. Output: The most frequent K words in the text. My thinking is like this. use a…
Morgan Cheng
  • 73,950
  • 66
  • 171
  • 230
44
votes
12 answers

Sorted Word frequency count using python

I have to count the word frequency in a text using python. I thought of keeping words in a dictionary and having a count for each of these words. Now if I have to sort the words according to # of occurrences. Can i do it with same dictionary instead…
AlgoMan
  • 2,785
  • 6
  • 34
  • 40
33
votes
8 answers

Word frequency algorithm for natural language processing

Without getting a degree in information retrieval, I'd like to know if there exists any algorithms for counting the frequency that words occur in a given body of text. The goal is to get a "general feel" of what people are saying over a set of…
Mark McDonald
  • 7,571
  • 6
  • 46
  • 53
21
votes
6 answers

list of word frequencies using R

I have been using the tm package to run some text analysis. My problem is with creating a list with words and their frequencies associated with the same library(tm) library(RWeka) txt <- read.csv("HW.csv",header=T) df <- do.call("rbind",…
ProcRJ
  • 211
  • 1
  • 2
  • 3
19
votes
2 answers

WordCount: how inefficient is McIlroy's solution?

Long story short: in 1986 an interviewer asked Donald Knuth to write a program that takes a text and a number N in input, and lists the N most used words sorted by their frequencies. Knuth produced a 10-pages Pascal program, to which Douglas McIlroy…
izabera
  • 659
  • 5
  • 11
18
votes
5 answers

Efficiently calculate word frequency in a string

I am parsing a long string of text and calculating the number of times each word occurs in Python. I have a function that works but I am looking for advice on whether there are ways I can make it more efficient(in terms of speed) and whether there's…
sazr
  • 24,984
  • 66
  • 194
  • 362
18
votes
2 answers

cannot perform reduce with flexible type plt.hist

I have a dataset with 1000s of elements and their respective frquencies. i need to plot a histogram of the top 10 occurring elements. i did: top_words = Counter(my_data).most_common() top_words_10 = top_words[:10] …
Hypothetical Ninja
  • 3,920
  • 13
  • 49
  • 75
15
votes
3 answers

Print 10 most frequently occurring words of a text that including and excluding stopwords

I got the question from here with my changes. I have following code: from nltk.corpus import stopwords def content_text(text): stopwords = nltk.corpus.stopwords.words('english') content = [w for w in text if w.lower() in stopwords] …
user2064809
  • 403
  • 1
  • 4
  • 13
14
votes
3 answers

Word frequencies from strings in Postgres?

Is it possible to identify distinct words and a count for each, from fields containing text strings in Postgres?
Marty
  • 141
  • 1
  • 1
  • 3
14
votes
7 answers

Determining Word Frequency of Specific Terms

I'm a non-computer science student doing a history thesis that involves determining the frequency of specific terms in a number of texts and then plotting these frequencies over time to determine changes and trends. While I have figured out how to…
fdsayre
  • 175
  • 2
  • 11
12
votes
1 answer

Convert sparse matrix (csc_matrix) to pandas dataframe

I want to convert this matrix into a pandas dataframe. csc_matrix The first number in the bracket should be the index, the second number being columns and the number in the end being the data. I want to do this to do feature selection in text…
Miya Wang
  • 219
  • 2
  • 3
  • 7
11
votes
5 answers

Combining Lists of Word Frequency Data

This seems like it should be an obvious question, but the tutorials and documentation on lists are not forthcoming. Many of these issues stem from the sheer size of my text files (hundreds of MB) and my attempts to boil them down to something…
canadian_scholar
  • 1,315
  • 12
  • 26
11
votes
1 answer

Count word frequency in a text?

Possible Duplicate: php: sort and count instances of words in a given string I am looking to write a php function which takes a string as input, splits it into words and then returns an array of words sorted by the frequency of occurence of each…
YD8877
  • 10,401
  • 20
  • 64
  • 92
10
votes
4 answers

Free database of Google word frequencies?

On the Stackoverflow podcast this week, Jeff mentioned that in 2004 he wrote a script which queried Google with 110,000 English words and collected a database containing the number of hits for each word. They use this on Stackoverflow e.g. for the…
Edward Tanguay
  • 189,012
  • 314
  • 712
  • 1,047
9
votes
2 answers

Python nltk counting word and phrase frequency

I am using NLTK and trying to get the word phrase count up to a certain length for a particular document as well as the frequency of each phrase. I tokenize the string to get the data list. from nltk.util import ngrams from nltk.tokenize import…
user1610950
  • 1,837
  • 5
  • 33
  • 49
1
2 3
22 23