1

I have been trying to extract 'words that leaders use to describe themselves' using Linked In Summaries as the data set.

1) I have cleaned the data using the 'tm' package in R

2) I extracted adjectives making use of 'POS Tagging' in the 'openNLP' package.

My first problem is that :

It extracts all adjectives, I just need adjectives such as loyal, innovative, passionate (adjectives of quality)

My Second problem :

Is there are a way to make the program understand what it is reading. Eg : the word 'mobile' gets tagged as an adjective, whereas it is a noun usually linked with 'mobile application' e.t.c

I am coding using R. Please help!

  • You could frequency the adjectives in any case. Unlikely that scheming, bossy, irritable or unscrupulous come up too often on Linked In. I haven't played with NLP too much so don't know if you can specify binning by pairs which might help with the compound nouns. You might also look into how you might import another POS tagger. [link](http://paula.petcu.tm.ro/init/default/post/opennlp-part-of-speech-tags) or [link](https://github.com/slavpetrov/universal-pos-tags) and [link](http://www.petrovi.de/data/universal.pdf) for the journal article. Hmm, is journal article a compound noun... – Chris Apr 14 '16 at 03:30
  • What do you mean by frequency the adjectives.?. I am taking a frequency of the words. But sometimes words like 'third' or a person's name shows up too. – thushara tom Apr 14 '16 at 04:27
  • Frequency is essentially agnostic to meaning, and tagging can generally tell 'what' a word is regards parts of speech (POS). Human readers normally can review a list and say which word doesn't belong contextually. This remains something of a difficulty for machine learning. Perhaps you could compare Linked In word frequency with the Corpus of American English [link] (http://corpus.byu.edu/coca/). I'm guessing there would be a fair match up with the first 10,000 words by frequency, which would suggest a university sophomore-ish level of vocabulary at Linked In. – Chris Apr 14 '16 at 04:53
  • Is there an example where this is used, the corpus of American English ? – thushara tom Apr 14 '16 at 05:25
  • Googled commonly used leadership words in self-description openNLP cran R that blessedly returns 8 items, the fifth guy down, who is on Linked In, gives a pretty good overview of his process in R [link](http://www.modsimworld.org/papers/2015/Natural_Language_Processing.pdf). The second item contains this interesting phrase "List of sentiment words from R package tm.plugin.tags". Sorry I'm not an NLP practitioner per se, I use corpus like stuff to predict how much vocabulary English 2nd Language speakers potentially command. search the full name of the corpus and cran r, 2040 listings. HTH – Chris Apr 14 '16 at 08:41
  • http://stackoverflow.com/questions/4600612/extracting-nounnoun-or-adjnounnoun-from-text?rq=1 – Chris Apr 14 '16 at 08:49
  • Thank you so much Chris. – thushara tom Apr 14 '16 at 11:13

0 Answers0