doclinx - products - TeraXML Text Mining

Multilingual Language Analysis and Named Entity Detection

TeraXML Language Analyzer software analyzes, identifies and categorizes important words and phrases contained in textual data in order to enable users to easily discover key information and concepts hidden in large collections of unstructured data.

Identification of important words and phrases establishes the meaning of a body of text. This is critical to software systems that process text from sources such as documents, email, TV, Radio or Web sites.

TeraXML Language Analyzer performs the following functions:

Tokenization: The process of identifying distinct words found in text.
Part-of-Speech Tagging: Classification of the words of a sentence into grammatical parts, such as nouns, verbs, adjectives etc.
Sentence Boundary Detection: Detecting the start and end of each sentence.
Base Noun Phrase Detection: Finding and separating Base Noun Phrases (a group of words that functions as a single noun).
Named Entity Detection: Classifying proper names (names of things) into person names, organization names, geographical names or dates.

"Named Entities" are proper nouns such as person names, geographical names, organizational names, dates, addresses or numeric expressions. The purpose of Named Entity recognition is to locate certain types of phrases, and associate them with a category.

Benefits

Targeted full-text queries
Conceptual document maps
Visualization of relationships between entities
Link discovery

The TeraXML Language Analyzer offers an XML messaging API which makes it easy to deploy distributed applications in a platform independent manner. It is flexible and extensible and can be trained to detect special patterns and custom Entities.

The TeraXML Language Analyzer is available for English, Arabic, French, Italian, German, Spanish, Chinese, Japanese and Korean.