Questions tagged [uima]

UIMA (Unstructured Information Management Architecture) is an architecture for creating scalable applications that analyze and extract information from unstructured data sources such as text, audio, and video. Apache UIMA is an open-source Java framework implementing the UIMA architecture. UIMA applications typically use natural language processing (NLP) techniques to perform analysis.

UIMA (Unstructured Information Management Architecture) is an architecture for creating scalable applications that analyze and extract information from unstructured data sources such as text, audio, and video. UIMA is specified in an OASIS standard. Apache UIMA is an open-source Java framework implementing the UIMA architecture. Apache UIMA is based on code open-sourced by IBM. UIMA was a central part of Jeopardy-playing IBM's Watson computer. UIMA applications typically use natural language processing (NLP) techniques to perform analysis.

UIMA defines applications as Collection Processing Engines (CPEs). Each CPE includes a Collection Reader (CR), one or more Analysis Engines (AE), and optionally a CAS Consumer.

A Collection is a repository of data to be analyzed, and it may take a number of forms, including RDBMS tables, a schema-less database, or a set of files on a filesystem. The first component in a CPE is the Collection Reader, which reads in pieces of data from the the Collection and packages the pieces in a data structured called the Common Analysis Structure (CAS). Collections can be stored in many ways, including RDBMS tables, schema-less databases, and files on a filesystem.

The CR passes CAS objects on to the first Analysis Engine in the pipeline. Each AE analyzes the information artifact packaged in a CAS, constructs annotations from the results of the analysis (e.g. parts of speech for words or phrases), and adds these annotations to the CAS before passing it on downstream. At the end of the pipeline, a CAS Consumer does something useful with the annotations, such as writing them to a database, or to files, or adding them to a semantic search index. Since version 2 of UIMA, the Apache UIMA documentation recommends using Analysis Engines instead of CAS Consumers, since AEs possess all of the required functionality for consuming CAS objects.

Each UIMA component has a descriptor in XML that defines its behavior and parameters. The descriptor for a Collection Processing Engine refers to the descriptors of each of its components and overrides their settings if desired.

UIMA supports conditional flow control, such that an annotation made in a CAS can determine which branch of a pipeline it takes downstream.

UIMA Asynchronous Scaleout is an add-on that enables a UIMA application to run many instances of an Analysis Engine to support higher throughput.

418 questions
3
votes
0 answers

integrating independent ontology into UIMA based cTAKES

Okay, so basically, I have a cTAKES pipeline that makes use of the UMLSlookupannotator to perform NER (named entity recognition). Additionally I have a .owl ontology that I have made using protege. Essentially, what I am trying to do is to extend…
jdv12
  • 171
  • 4
  • 17
3
votes
1 answer

uima ruta Score Condition

I tried a Script to mark the Journal using Score Condition. W{REGEXP("Journal",true)->MARK(ONLY_Journal)}; W{REGEXP("Retraction|Retracted")->MARK(RETRACT)}; W{REGEXP("Suppl")->MARK(SUPPLY)}; NUM {->MARK(VOLUMEISSUE,1,6)}LParen NUM …
3
votes
1 answer

How/are you supposed to use the DKPro libraries with UIMA Ruta?

I have studied the default UIMA Ruta Workbench Eclipse project enough to significantly understand its moving parts - for instance, why the input/ and output/ folders behave as they do, how to accomplish the project using the jcasgen and other Maven…
tacos_tacos_tacos
  • 10,277
  • 11
  • 73
  • 126
3
votes
1 answer

what is the NLTK equivalent of the UIMA CAS (common annotation structure)?

In UIMA, the CAS (common annotating structure) plays a major role in structuring an NLP application. It allows to pass the metadata that one components adds into the next compoment. For example, sentence boundaries from a sentence tokenizer can be…
Renaud
  • 16,073
  • 6
  • 81
  • 79
3
votes
1 answer

How to convert custom annotations to UIMA CAS structures and serialize them to XMI

I am having a problem converting custom annotated documents to UIMA CASes and then serializing them to XMI in order to view the annotations through the UIMA annotation viewer GUI. I am using uimaFIT to construct my components due to the fact that it…
3
votes
1 answer

Running java program by considering import dependencies

I have java file at location. /root/Desktop/software/UIMA/yagogit/yodaqa/src/main/java/cz/brmlab/yodaqa/analysis/question/FocusGenerator.java This file is part of entire project - FocusGenerator.java it is importing couple of classes from UIMA and…
puncrazy
  • 349
  • 2
  • 14
3
votes
1 answer

Using Apache UIMA ConceptMapper in a "proof-of-concept mode"

I'm trying to use UIMA ConceptMapper to extract some key concepts and other interesting metadata from text documents. Due to the time constraints of the project and the fact that I'm not sure if UIMA ConceptMapper will work in this scenario, does…
Uzumaki Naruto
  • 547
  • 5
  • 18
3
votes
1 answer

Setting feature value to the count of containing annotation in UIMA Ruta

I've got a RUTA script where all the sentences have been annotated with a Sentence annotation and various words and phrases have been annotated with their own specific annotations. That all works as expected. Each one of those annotations has a…
Nick Collier
  • 1,786
  • 9
  • 10
3
votes
1 answer

How should I use UIMA Ruta to match the all words between line break?

Thank for any strong hands! I have some text like the following aaaaa aaaa aaaaa aaaaaa bbbbb bbbbb bbbb bbbbbb cccccc ccccc ccccc cccccc I want to use Ruta to create annotation that matches all strings between line break. I want my annotation to…
Cheung Brian
  • 715
  • 4
  • 11
  • 29
3
votes
1 answer

UIMA RUTA - how to do find & replace using regular expression and groups

RUTA newbie here. I'm processing a document using RUTA and have a lot of normalization to do before I can start annotating. I'm trying to find the best way to do a Find and Replace of sequence of characters using regular expressions and groups on…
3
votes
1 answer

NoSuchMethodError when running UIMA Ruta script from UIMAFIT SimplePipeline

I am trying to run an existing UIMA Ruta analysis engine from a UIMAFIT simple pipeline using the following code: File specFile = new File("MyEngine.xml"); XMLInputSource in = new XMLInputSource(specFile); ResourceSpecifier specifier =…
3
votes
1 answer

cleartk dependency not found when calling StanfordCoreNLPAnnotator from UIMA RUTA

I am trying to call ClearTK's StanfordCoreNLPAnnotator from within UIMA RUTA, but cannot get it to work. I am using eclipse with a maven-enabled RUTA project in which I also have Java code for auxiliary tasks. I have imported…
3
votes
0 answers

How to process CASes produced by CAS Multiplier concurrently

I am implementing UIMA pipeline with CASMultiplier and UIMA AS. I have a Segmenter Analysis Engine (A CASMultiplier) and a Analysis Engine (Annotator A). I created a Aggregate Analysis Engine of the Segmenter and Annotator A, and then I create a…
trangmx
  • 81
  • 1
  • 4
3
votes
1 answer

Is it possible to create a hierarchy of annotations with UIMA?

I would like to be able to get a common feature from different annotation types. Is it possible to create sub-classes of the annotations and then get them by the super-class? This is the way I am doing it at the moment but I would like to be able…
robingrindrod
  • 474
  • 4
  • 18
3
votes
1 answer

How start with UIMA and simple NLP tasks?

I've recently found out about UIMA (http://uima.apache.org/). It looks promising for simple NLP tasks, such as tokenizing, sentence splitting, part-of-speech tagging etc. I've managed to get my hands on an already configured minimal java sample that…
citronas
  • 19,035
  • 27
  • 96
  • 164
1 2
3
27 28