TERMite
A text analysis engine that extracts and organizes data from scientific documents using named entity recognition.
Overview
TERMite is a powerful text analysis engine designed to unlock vital information from scientific texts through named entity recognition and extraction. It efficiently tags, annotates, and organizes unstructured content, transforming it into rich, machine-readable data.
With TERMite, you can rapidly process millions of documents without the need for pre-indexing or complex setup. It indexes at a speed of up to one million words per second, capable of handling large-scale document processing on systems like Hadoop.
TERMite accurately tags and links scientific terms within unstructured text using SciBite’s VOCabs, which include over 20 million synonyms across more than 80 scientific topics. This enables scientists and researchers to scan a wide range of documents, including publications, patents, and reports, to uncover crucial information.
Key Features
- Data-mining capabilities for millions of documents, identifying critical mentions and relationships.
- Enhances internal search tools by accurately finding key entities, improving performance and productivity.
- Suitable for various roles, enriching content in textual content production and IT systems.
- Seamlessly integrates into existing workflows, allowing for both one-off and routine analyses.
The latest version, TERMite 6.6.3, introduces new vocabularies and updates to existing ones, enhancing semantic annotation, term disambiguation, and data integration workflows. Notable updates include the MONDO disease ontology, EMTREE_PERSON for named groups of persons, and UNIPROTMOUSE for mouse proteins.
TERMite also addresses security with bug fixes and security enhancements, ensuring platform stability. The tool supports up to 23 NER machine learning models and offers one-click integration with CENtree for ontology editing.


