2

I'm looking for some advice about natural language processing. I wanna do some research but i'm not sure what i'm researching. Sounds awkward but.. Imagine i have a text about an animal. It contains sentences like

"Dogs live at the northpole. They are about 1-3m long."

And stuff like that. Not only about dogs but a bunch of texts describing animals in words. Now i'm looking for something that analyses the text and recognizes "keywords" like "live" or "long" and then kinda collects the data and provides something like a data sheet for the animal like you get when you buy a new printer.

So i'm not looking for a tool or something like that (but wouldn't mind) i need more like some advice about keywords i could do some research on. pretty tough starting with a search about nlp. thanks in advance!!

iamgr007
  • 966
  • 1
  • 8
  • 28
epix
  • 35
  • 6
  • try different sentences you are interested in, in Stanford parser. It will give you an idea about what sort of constituencies the information you want will occur in. Then all you need to do is to find animals, parse sentences and look at those constituencies. Perhaps you can even do this without going into the trouble of machine learning(ifying) your task - assuming that you are not a computer scientist and this will be too much of a hassle than help. Try things on the online Stanford parser: http://nlp.stanford.edu:8080/parser/index.jsp – user3639557 Oct 06 '16 at 00:16

1 Answers1

1

Once you know the animal you're talking about(which can be done by training a model to find animals in the given text), all you need to do is use "co-reference" and find out what is told about the animal. I don't understand what research you want to do but this is what I'll do.

I'll use Opennlp to train a model for animals and then use coreference to find out the feautures of the animal and then put it in a table.

some support: here & here

Community
  • 1
  • 1
iamgr007
  • 966
  • 1
  • 8
  • 28
  • Thanks for your advice (and clearing up my question)! That was very helpful and exactly what i looked for. – epix Oct 12 '16 at 12:17
  • I'm glad that helped you! – iamgr007 Oct 12 '16 at 12:24
  • Ok, by now i figured out that it's not that simple to even recognize the latin name of an animal in a document. Using a namefinder ner model doesn't look like the right way unless you have massive sample data to train your model propably... – epix Oct 19 '16 at 11:45
  • Yeah,you'll be needing a taggef dataset containing around 15000 lines with a lot of animal names and contexts to get very good results, but I've got good results with around 2000-3000 lines. – iamgr007 Oct 19 '16 at 13:41
  • Currently i successfully parsed a lot of animals including description from the internet and tagged that. Now i tried to train a new model and it shows up with this. `"Number of Outcomes: 1`" `"Model not compatible with name finder!`" I think i followed all hints i found like that extra spaces for the tagged word `" doggy cat `" and no . right after the tag. Still no success on this step. Is there any chance you could provide me tagged training-set that i can compare the structure to mine @root ? Sorry that i bother you so much here – epix Nov 01 '16 at 23:29
  • try taking only cat or doggy, not both at a time. – iamgr007 Nov 01 '16 at 23:36