0

Is there a way to linguistically parse English text? I mean get something like this?

I{I,pronoun} am{to be, verb, Present Simple} late{late, adverb}.

Or even better with dependencies, like:

I -> am -> (what?) -> late.

Better in Java, but it doesn't matter much.

Denis Kulagin
  • 8,472
  • 17
  • 60
  • 129
  • 1
    Most proper parsers produce trees, like (S (((I pron subject) (am V-cop predicate)) (late adj predicative)), though there are other formalizations of dependencies, language models, etc. But this topic is far too wide for a StackOverflow question. – tripleee Nov 14 '14 at 12:10
  • Stanford dependency parser – alvas Nov 14 '14 at 22:19

2 Answers2

1

The NLTK package is meant to do what you want : http://www.nltk.org/

import nltk
sentence="I'm late."
words=nltk.word_tokenize(sentence)
tagged=nltk.pos_tag(words)
>>>>tagged
[('I', 'PRP'), ("'m", 'VBP'), ('late', 'JJ'), ('.', '.')]
GAM PUB
  • 218
  • 4
  • 11
0

There are a lot of linguistic dictionaries across the internet.

You should just download one of them, parse and use it for your needs...

You also should consider mistakes and other stuff that can take place , for this you should consider Natural language processing, look here

Maksym
  • 4,434
  • 4
  • 27
  • 46