2

Given "violence" as input would it be possible to come up with how violence construed by a person (e.g. physical violence, a book, an album, a musical group ..) as mentioned below in Ref #1.

Assuming if the user meant an Album, what would be the best way to look for violence as an album from a set of tweets.

Is there a way to infer this via any of the NLP API(s) say OpenNLP.

Ref #1

violence/N1 - intentional harmful physical action.
violence/N2 - the property of being wild or turbulent.
Violence/N6 - a book from Neil L. Whitehead; nonfiction
Violence/N7 - an album by The Last Resort
Violence/N8 - Violence is the third album by the Washington-based Alternative metal music group Nothingface.
Violence/N9 - a musical group which produced the albums Eternal Nightmare and Nothing to Gain
Violence/N10 - a song by Aesthetic Perfection, Angel Witch, Arsenic, Beth Torbert, Brigada Flores Magon, etc on the albums A Natural Disaster, Adult Themes for Voice, I Bificus, Retribution, S.D.E., etc
Violence/N11 - an album by Bombardier, Dark Quarterer and Invisible Limits
Violence/N12 - a song by CharlElie Couture, EsprieM, Fraebbblarnir, Ian Hunter, Implant, etc on the albums All the Young Dudes, Broke, No Regrets, Power of Limits, Repercussions, etc
Violence/N18 - Violence: The Roleplaying Game of Egregious and Repulsive Bloodshed is a short, 32-page roleplaying game written by Greg Costikyan under the pseudonym "Designer X" and published by Hogshead Publishing as part of its New Style line of games.
Violence/N42 - Violence (1947) is an American drama film noir directed by Jack Bernhard.
Rpj
  • 5,348
  • 16
  • 62
  • 122

5 Answers5

2

Pure automatic inference is a little to hard in general for this problem.

Instead we might use :

  • Resources like WordNet, or a semantics dictionary. For languages other than English you can look at eurowordnet (non free) dataset.

  • To get more meaning (i.e. for the album sense) we process some well managed resource like Wikipedia. Wikipedia as a lot of meta information that would be very useful for this kind of processing.

  • The reliability of the process is achieve just by combining the maximum number of data source and processing them correctly, with specialized programs.

  • As a last resort you may try hand processing/annotating. Long and costly, but useful in enterprise context where you need only a small part of a language.

No free lunch here.

GrantD71
  • 1,787
  • 3
  • 19
  • 27
Galigator
  • 8,957
  • 2
  • 25
  • 39
  • It is possible to extract meanings of words from unstructured text by searching for [phrases that indicate word definitions](https://www.google.com/search?q=%22the+word+*+means%22). – Anderson Green Jul 13 '22 at 00:39
1

If you're working on English NLP in python, then you can try the wordnet API as such:

from nltk.corpus import wordnet as wn
query = 'violence'
for ss in wn.synsets(query):
  print query, str(ss.offset).zfill(8)+'-'+ss.pos, ss.definition

If you're working on other human languages, maybe you can take a look at the open wordnets available from http://casta-net.jp/~kuribayashi/multi/

NOTE: the reason for str(ss.offset).zfill(8)+'-'+ss.pos, it's because it is used as the unique id for each sense of a specific word. And this id is consistent across the open wordnets for every language. the first 8 digits gives the id and the character after the dash is the Part-of-Speech of the sense.

alvas
  • 115,346
  • 109
  • 446
  • 738
1

Check this out: Twitter Filtering Demo from Idilia. It does exactly what you want by first analyzing a piece of text to discover the meaning of its words and then filtering the texts that contain the sense that you are looking for. It's available as an API.

Disclaimer: I work for Idilia.

PBelzile
  • 108
  • 7
0

You can extract all contexts "violence" occurs in (context can be a whole document, or a window of say 50 words), then convert them into features (using say bag of words), then cluster these features. As clustering is unsupervised, you won't have names for the clusters, but you can label them with some typical context.

Then you need to see which cluster "violence" in the query belongs to. Either based on other words in the query which act as a context or by asking explicitly (Do you mean violence as in "...." or as in "....")

Jirka
  • 4,184
  • 30
  • 40
0

This will be incredibly difficult due to the fact that the proper noun uses of the word 'Violence' will be incredibly infrequent as a proportion of all words and their frequency distribution is likely highly skewed in some way. We run into these problems almost any time we want to do some form of Named Entity Disambiguation.

No tool I'm aware of will do this for you, so you will be building your own classifier. Using Wikipedia as a training resource as Mr K suggested is probably your best bet.

GrantD71
  • 1,787
  • 3
  • 19
  • 27