A classification model comprises the list of categories as well as the resources required to classify documents into the classes defined. For instance, a model may allow classifying movie synopses with respect to their genre. The model would include categories such as thriller, terror or romantic. Formally, each category is identified by a code and a label, a short description of the purpose of the category.
The classification process is based on a hybrid algorithm that combines statistical methods with linguistic rules to achieve the maximum classification accuracy and control over the results. Thus, each category includes additional training documents and/or rules to classify documents.
Each category contain additional fields to provide training text and define four sets of manual rules, like relevant or irrelevant terms, that determine the behaviour of the classification model.
MeaningCloud provides some models for common use cases:
In the supported models section there are more details about these models, the categories defined for each one of them and the languages they are available in.
These are the exact values associated to each available model you have to enter as the model
parameter when using the API.
IPTC_en
: English IPTC model.IPTC_es
: Spanish IPTC model.IPTC_fr
: French IPTC model.IPTC_it
: Italian IPTC model.IPTC_pt
: Portuguese IPTC model.IPTC_ca
: Catalan IPTC model.EUROVOC_es_ca
: EuroVoc EU's multilingual thesaurus (Spanish/Catalan).BusinessRep_es
: Business Reputation (Spanish).SocialMedia_en
: English Social Media model.SocialMedia_es
: Spanish Social Media model.In addition, you can define your own classification models. They provide a similar output to our own IPTC categorization model. The relevance of a document with respect to a subject is judged and the output is attached to the response for further processing. This method allows you to decide the type of classification (binary, multiclass, multilabel, single label) that better suits your application.