Multiwords are combinations of words that are always grouped together if they appear in a specific order. These words are used in the different entries of the model, but are going to affect every text analyzed.
For that reason, it's useful to be able to have an overview of all the multiwords defined in the model, to be able to identify quickly the ones that may be causing a conflict or an unexpected behavior in the analysis.
The main table shows the information about all the multiwords defined in the entries and subentries. They can be browsed by name, as entered by the user, and directly go to the entry where each one is defined, either in the entry definition or in one of its subentries. The table allows to order the multiwords alphabetically by each column in it and it also provides a dynamic text filter.
Multiwords define an scenario more restricting than single words do, which means than in the cases where there are several scenarios possible, the more restricting one is chosen as its occurrence is rarer.
In other words, when a multiword is defined in our model, if that multiword appears in a text, for sentiment analysis purposes, that multiword will always be grouped.
Let's see an example:
Case | Entries | Analysis | ||
---|---|---|---|---|
Entry definition | Sentiment behavior | Text | Result | |
1 | global crisis before end | POSITIVE | The global crisis is ending | POSITIVE |
crisis | NEGATIVE | |||
2 | global crisis before end | POSITIVE | There's a huge crisis | NEGATIVE |
crisis | NEGATIVE | |||
3 | global crisis before end | POSITIVE | There's a huge global crisis | NONE |
crisis | NEGATIVE |
In the last row you can see that, while the multiword defined in the first entry is detected, the whole context is not, and so the text is not assigned the sentiment behavior defined for that entry. Similarly, as the multiword is detected, the word defined in the second entry is not, and so the text does not get its polarity either. See that in the cases where two patterns could match, the one to appear first in the text will be the chosen one.
When more than one grouping possibility exists, the system will choose the longest. In the text "the huge global crisis", with "global crisis" and "huge global crisis" as the defined multiwords, the second one will be chosen.