Our APIs use the last version of the Apache Tika toolkit to detect and extract metadata and structured text information from submitted content. A comprehensive list of supported formats can be found in this page. However, there is a summary below of the main supported formats with respect to MeaningCloud's APIs. This applies to both the URLs and the files that can be analyzed.
The following table shows the behavior of types of formats and the specific formats supported within that type:
Type | Behavior | Specific formats supported |
---|---|---|
Markup language formats | Everything contained between markup tags will be analyzed. The tags will be ignored. |
|
General text formats | The whole content of the files will be analyzed. |
|
Documents | The whole content of the files will be analyzed. |
|
Media file formats | Only the metadata associated to the file will be analyzed. |
|
Other formats | Any email extension that complies with the Mbox format. |