Given a single word such as "table", I want to identify what it is most commonly used as, whether its most common usage is noun, verb or adjective. I want to do this in python. Is there anything else besides wordnet too? I don't prefer wordnet. Or, if I use wordnet, how would I do it exactly with it?
Asked
Active
Viewed 1.6k times
4
-
You need part-of-speech [tagging](http://www.nltk.org/book/ch05.html). – Vidul Sep 05 '15 at 09:47
-
Wordnet has a frequency for each *sense* of a word (e.g., 'table'). But this has not been updated since 2003 (as far as I can recall). The better option is to download Google n-grams and do POS tagging on that dataset. – Chthonic Project Sep 05 '15 at 22:30
-
How are you going to POS-tag 5-word ngrams? That idea is a no-starter. But google does provide [ngram files](http://storage.googleapis.com/books/ngrams/books/datasetsv2.html) classified by POS of the first word, so that would be a way to get an extensive count. If you have the disk space and really need to churn through that much data. – alexis Sep 06 '15 at 11:54
2 Answers
11
import nltk
text = 'This is a table. We should table this offer. The table is in the center.'
text = nltk.word_tokenize(text)
result = nltk.pos_tag(text)
result = [i for i in result if i[0].lower() == 'table']
print(result) # [('table', 'JJ'), ('table', 'VB'), ('table', 'NN')]

Vidul
- 10,128
- 2
- 18
- 20
-
1What if the word is out of context. Just the word "table", and its most common usage, whether its noun, verb and so on. – jonty rhodes Sep 05 '15 at 10:07
-
4What do you mean by "out of context"? It's the context that makes this definition (part of speech) possible. – Vidul Sep 05 '15 at 10:12
-
3
-
1I mean given the word "table", what is this word's most common usage in real world - noun, verb, adjective etc. – jonty rhodes Sep 05 '15 at 11:04
-
1@jontyrhodes You mean its most common usage as a whole on this planet? I don't know, you should ask Google probably. – Vidul Sep 05 '15 at 11:18
-
"table" OK, is that usage a noun or a verb? Your question can't be answered because a word by itself does not have that property. – stark Sep 05 '15 at 11:22
-
The OP is not asking about "that usage", but about "most commonly used". The question _can_ be answered as formulated. – alexis Sep 07 '15 at 11:59
-
@alexis What do you mean by "most commonly used"? Most commonly in what context? – Vidul Sep 07 '15 at 13:03
-
Read the question, and the OP's first comment: "just the word 'table', and the most common usage", i.e. regardless of context. Your answer shows how to do it when you have context, fine. Of course any answer will depend on the reference corpus, but why keep quibbling with the question? There _are_ reasonable ways to answer it (e.g., my answer :-)) – alexis Sep 07 '15 at 13:30
-
@alexis No, your answer is no different. You have just changed the context. – Vidul Sep 07 '15 at 13:33
-
7
If you have a word out of context and want to know its most common use, you could look at someone else's frequency table (e.g. WordNet), or you can do your own counts: Just find a tagged corpus that's large enough for your purposes, and count its instances. If you want to use a free corpus, the NLTK includes the Brown corpus (1 million words). The NLTK also provides methods for working with larger, non-free corpora (e.g, the British National Corpus).
import nltk
from nltk.corpus import brown
table = nltk.FreqDist(t for w, t in brown.tagged_words() if w.lower() == 'table')
print(table.most_common())
[('NN', 147), ('NN-TL', 50), ('VB', 1)]