Automatically identify the structure of a document’s contents
The Document Structure Analysis API identifies the main structural components of a document or email, extracting titles, section headings, subject, recipient, sender, and more to generate an outline resembling a “table of contents” of the document or message. Use it to get an overview of the component structure of a document.
MeaningCloud’s Document Structure Analysis API
Unfortunately, not all documents come with their built-in table of contents. Many documents and other contents (such as emails) are presented as a sequence of words that should be traversed from beginning to end to get an idea of their structure. The Structure Analysis API of a MeaningCloud Document automatically extracts that structure from both documents (title, section headings, and subsections) and emails (recipient, sender, subject).
In this way we can achieve a structural understanding of the content, identifying the components of the document and their titles as they appear in the original.
Document structure analysis applications
Automatically identifying the parts of a document provides you with a structural view that can be very useful in a variety of applications.
Knowledge management
When the knowledge of the organization is stored in thousands of documents, identifying the components that integrate each one allows to better leverage them.
Content publishing
Complementing contents with a description of their structure makes them more exploitable and valuable.
Communication surveillance
Being able to automatically analyze the structure of a collection of emails allows for detection of suspicious patterns in compliance applications.
Highlights of the Document Structure Analysis API
The Document Structure Analysis API is powerful, versatile, and useful in a wide range of scenarios.
Multilingual
It works regardless of the language the text is written in.
Powerful
It leverages both document markup and language markers.
For documents and emails
It identifies parts of documents and email components.
Flexible and easy to integrate
It supports various formats, and its standard interface allows for easy integration with any application.