1

Anybody knows a Java implementation of Constraint Grammar for natural language processing? I know the VISL CG3 implementation, that is in C++, and I could interface it from Java, but it would be easier if I could find a Java implementation since it will be integrated to a legacy Java code. This will be used in a Portuguese open source grammar checker and should be compatible with LGPL license.

unhammer
  • 4,306
  • 2
  • 39
  • 52
wcolen
  • 1,401
  • 10
  • 15

3 Answers3

1

Have a look on JAPE: Regular Expressions over Annotations. A formalism based on CPSL (COMMON PATTERN SPECIFICATION LANGUAGE) in old TIPSTER project.

It's not true context-dependent (as Context Grammar should be) but it's possible to do context dependent things with it. This is free and open source. And has a lot of Java examples.

XTDL from SPROUT project also worth looking. Not sure is it free or not.

andrey
  • 842
  • 4
  • 6
  • I am reading the JAPE documentation. I need to check if it can be used in the output of a parser (tree structure). Some rules would check relations between parent and son nodes. Also I couldn't find a standalone version of Jape. The Jar should be small, and gate-core.jar exceeds 4.8 MB. – wcolen Dec 22 '11 at 15:55
  • JAPE doesn't work with tree structures. Instead of building complex trees it does repeatable shallow processing and uses advantage of simplicity. It takes annotations (Java class gate.Annotation) as input terminals, matches them with regexp-like patterns (left hand sides of the rules) and if match was found produces output annotations. Job is done by phases where each phase (set of rules) process annotations made by previous phases to produce (or delete) other annotations. First annotations created by Tokenizer from document text. – andrey Dec 22 '11 at 20:28
  • When processing is finished document contains annotations with data you created. They can be exported to XML or processed by you custom Java code (In fact it will be 1 more ProcessingResource) – andrey Dec 22 '11 at 20:29
  • About a big jar. GATE for NLP is like Eclipse for Java. (but 5mb instead of 150mb). It's not just a library but development enviroment. However you can run GATE without GUI and process bunch of documents. Architecture is based on 3 base elements: LanguageResource (documents and other data i.e. corpus, ontology), ProcessingResource (workers) and Application (combinations of Language&Processing Resources). – andrey Dec 22 '11 at 20:29
  • JAPE can't be extracted into separate compact library because of too many dependencies with other parts of system. (In the same way it's not possible to extract Eclipse debugger from Eclipse). In addition to these 4.8 mb you can find many megabytes of plugins - some of them may be very useful. – andrey Dec 22 '11 at 20:29
  • perhaps RapidMiner http://www.rapidminer.com (eventually considered as competitor of GATE) have Constraint Grammar formalism. But I don't know this tool. – andrey Dec 22 '11 at 20:37
1

I'm not sure if you are looking for regex over semantic graphs and tree structures. If it's the case, you can check Tregex and Semgrex that matches over Stanford dependency graphs and constituent trees.

Kenston Choi
  • 2,862
  • 1
  • 27
  • 37
  • Thank you, Kenston. Very nice project. Unfortunately it is GPL and I can't use it in a LGPL open source grammar checker. – wcolen Dec 23 '11 at 10:50
1

I haven't tried Graph-Expression, but the site states that it provides a language for "structure of match -it is possible to build syntax tree based on match". I think this is comparable to JAPE (as it states in the site: "fast - it works faster then Jape transducer (gate.ac.uk) closest project to this one"). And I assume it can handle graphs, something JAPE may not be good at.

Kenston Choi
  • 2,862
  • 1
  • 27
  • 37