0

Is there any available tool/library (preferably established/solid commercial product or open source) that can extract structured data from plain text? Usually the plain text contains boolean or math operands like (AND, OR, BETWEEN, etc.).

I like AWS Comprehend but I'm not sure it can be used for this task easily.

vehicle with 2 to 5 wheels
=>
SUBJECT: vehicle
EXPRESSION:
  SUBJECT: wheels
  OPERAND: BETWEEN
    NUMBER: 2
    NUMBER: 5
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Dan L.
  • 1,717
  • 1
  • 21
  • 41
  • Maybe a bit burdensome for this purpose but if your text has the same structure everytime you could always use Regular expressions (regex). Almost every programming language supports regex. This is a common way to extract pieces of text from data. sidenote: I experienced regex to be somewhat hard to learn. There must be some courses around. – Kennos Mar 06 '20 at 11:39
  • regex sounds too low-level and is not scalable. I'd prefer something NLP-based which allows for much more flexibility. – Dan L. Mar 06 '20 at 12:19

1 Answers1

0

Comprehend does not support converting text to structured format natively. However, you can derive the parts of speech using the Syntax API and create a rule based structure from there.

https://docs.aws.amazon.com/comprehend/latest/dg/how-syntax.html

For the example above, "vehicle" and "wheels" will be detected as nouns, "2" and "5" will be detected as numerals/value and "to" and "with" is detected as adposition.

abhinavatAWS
  • 111
  • 3