In this section, we will introduce the specific language to create rules, which will be divided into seven sections:
After we've seen all the possible operators that can be used with the elements we can define, we will see some examples that will illustrate how they can be combined.
Please note that rules always apply in the defined context for the model. This is determined by the split sentences
parameter in the model settings. If split sentences
is enabled, the engine will try to match each rule against each sentence in the text.
There are three types of elements:
The following table shows a few examples of each element:
Element | Rule | Example |
---|---|---|
Simple term | tree |
"Trees have a trunk, with supporting branches and leaves in most species." |
Simple term | butterfly |
"Adult butterflies have large and brightly coloured wings." |
Simple term | play |
"My son usually plays football on Sunday." "I've never seen such a good play." |
Parsing multiword | El_Salvador |
"El Salvador is the smallest and the most densely populated country in Central America." |
Parsing multiword | scuba_diver |
"Scuba diving equipment allows you to visit the underwater world." "The scuba diver spent the whole day practicing the new technique." |
Parsing multiword | vice_president |
"The vice president is called the deputy president" |
Literal expression (multiword) | "sense of direction" |
"He has no sense, he gave you the wrong direction." "He has no sense of direction." |
Literal expression (multiword) | "tech|technical|product support|assistance|service" |
"The service support was fantastic." "The technical support was fantastic." "The product offers no support." "They offer great product support." |
Literal expression (multiword) | "I get lost|confused" |
"I often get lost." "You get confused easily." "I get confused easily." |
When defining literal expressions (multiwords), you can use a pipe, |
, to include multiple simple terms, as seen in the last row.
A lemma is the canonical form, dictionary form, or citation form of a set of words. In order to specify this feature in the model, use "L@
" right before the term (L@term). In that way, the rule will take all possible options within that lemma into account. At the same time, homonymous words that come from different lemmas - which have different meanings – will not be considered (e.g. 'heading' includes both the gerund of the verb ‘to head” and the noun; L@heading includes the noun in its singular and plural forms, but not the verbal form).
Forms are different morphological variations of a given lemma. In order to specify this feature in the model, use "@F
" right before the term (F@term). If a form is used, the rule will only work for that specific form.
Element | Example |
---|---|
play |
"I usually play football on Sunday." "I will play football next weekend." "He is playing football right now." "I've never seen such a good play." |
F@play |
"I usually play football on Sunday." "I will play football next weekend." "He is playing football right now." "I've never seen such a good play." |
double_tap |
"She gave me a double tap." "I double tapped my new friend on Instagram." |
F@double_tap |
"She gave me a double tap." "I double tapped my new friend on Instagram." |
"Good|bad|great promotion|F@deals" |
"I always get good deals." "That was a great deal" |
L@heading |
"Please check the headings before publishing." "She's heading home right now." |
Models assume lemma as the default unless specific form (@F) is specified. Parsing multiwords can be also used with form and lemma features, as seen in the previous examples.
There are different grammar features that can be specified according to part of speech. This will help us disambiguate the context of the rule we are defining according to the grammatical function a specific word or words carry out in the text. To specify the grammar aspect, we just have to add right after the term @
and the letter that represents the aspect.
These are the possible features available:
Aspect | Description | Rule | Example |
---|---|---|---|
@V |
Verb | smash@V |
"He heard a smash of glass." "The thief smashed the window." |
@V1 |
First person of the verb | eat@V1 |
"James will eat as much as possible." "I eat around noon." |
@V2 |
Second person of the verb | eat@V2 |
"They eat together on Tuesdays." "You are eating too much." |
@V3 |
Third person of the verb | eat@V3 |
"You can play after you eat." "His father is eating by himself." |
@V- |
Non-personal forms of the verb | eat@V- |
"You can play after you eat." "Eating is necessary." |
@N |
Noun | advert@N |
"I have already adverted to the solar revolution." "The adverts you see are brand new." |
@A |
Adjective | mean@A |
"That doesn't mean the same thing." "He's a very mean person." |
@E |
Adverb | "worked hard@E" |
"He worked hard jobs all his life." "He worked hard all his life." |
@T |
Article | A@T |
"The next chord is A." "A guitar is an instrument." |
@D |
Demonstrative | "@D person|woman|man" |
"The woman dropped the bag." "That man dropped the bag." |
@M |
Numeral | "@M euros|dollars" |
"He has some euros left." "He owes me 10 euros." |
@P |
Personal pronoun | "@P can_withdraw|may_withdraw" |
"He never will withdraw his support." "She can withdraw her consent." |
@S |
Possessive | "@S {family}" |
"The family is coming." "His family is coming." |
@Y |
Preposition | "@Y people" |
"The people he's sharing room with are strangers." "He feels comfortable among people of the same tastes." |
@Q |
Quantifier | "@Q car|bus|motorbike" |
"Those cars are red." "Many cars are red." |
All of these features can also be used by themselves to refer to any word that works as the chosen part of speech in the text. This is an example of how this works:
"You @E run" |
"You have run through that park." "You always run there." "You rarely run around the block." |
The negation feature +
or -
identifies if a term is affected by negation or not.
Rule | Example |
---|---|
+enthusiastic |
"I am enthusiastic about going to the bowling alley tonight." "I am not enthusiastic about going to the bowling alley tonight." |
-enthusiastic |
"I am enthusiastic about going to the bowling alley tonight." "I am not enthusiastic about going to the bowling alley tonight." |
Semantic information can be used to define rules, enabling you to group words by their meaning. Currently only the semantic information defined in MeaningCloud's default resources is supported. These are the possible values:
S@
[ontology type] and searches the internal semantic attribute sementity.T@
[theme type] and searches the internal semantic attribute semtheme.G@
[semgeo] and searches the internal semantic attribute semgeo."S@Top>Product>Food" | Comprises all the words tagged as Top>Product>Food or any of its descendants. |
"I buy a car." "I buy an orange." |
"T@Top>Sport" | Comprises all the words tagged as Top>Sport or any of its descendants. |
"Writing is fun." "Karate is fun." |
"G@America" | Comprises all the words tagged as having America as their geographic information. |
"He was born in Spain." "He recently visited Peru." |
For the ontology entity type, unless otherwise defined, both entities and concepts are taken into account. In order to consider just one of the two, it would be necessary to to add an additional tag to the end of the ontology type @class
for concepts and @instance
for entities.
Sometimes polysemy, words with more than one meaning, plays against us. For that purpose, we have the _multi
X parameter. This parameter specifies wether a term has more than one meaning or not. We have three parameters:
_multiS
implies that the term has more than one sense._multiT
implies that it has more than one theme._multiG
implies that it has more than one geo information.Semantic aspects are explained in more detail here.
It is possible to use any ontology type you have previously defined in your user dictionaries just by adding them here.
Logical operators are used to associate the appearances of different terms using Boolean logic. There are three operators available: AND, OR, and AND NOT, (brackets should be used to define their precedence in a rule). Please note that these operators apply only on the pre-established context.
Operator | Definition | Rule | Example |
---|---|---|---|
AND | All terms have to appear and order does not matter. | network AND customer|client |
"The network is down." "The client's network is down." |
OR | One term or the other. It can be expressed by using pipe ("|") or the operator "OR". | apartment OR flat |
"That's my house." "That's my apartment." |
AND NOT | Exclude terms. | food AND home AND NOT restaurant |
"We brought food home from the restaurant." "We don't have any food home." |
The operator OR must be used when the rule includes literal expressions. Example:
Brackets can be used to indicate operator precedence. Example:
Context operators define the context where the rules apply.
::
operator omits the following term from its preceding broader range. In other words, this operator excludes some terms from a larger list."S@Top>LivingThing>Animal::cat|dog|hamster" | It will detect all the words that fall into the "Animal" category except those that are specified afterwards. |
"Her turtle's name is Vaca." "Her cat's name is Tortuga"
|
WORDS
operator delimits the number of words that the context must have in order for the rule to apply."woman AND WORDS<7" | The rule will apply if the context has less than seven words. |
"Pretty woman walking down the street." "Pretty woman the kind I'd like to meet."
|
"woman AND WORDS>7" | The rule will apply if the context has more than seven words. |
"Pretty woman I don't believe you, you're not the truth." "Oh, pretty woman."
|
"woman AND WORDS=7" | The rule will apply if the context has exactly seven words. |
"Pretty woman stop awhile." "Pretty woman talk awhile." "Pretty woman give your smile to me."
|
Distances operators allow you to define co-appearances of several words in a specific range. They can include as many terms as you need, but the distance between the first and the last term must be specified. There are two options to be considered when incorporating distance operators:
[
termA termB]~
number: Terms can appear in any order with the maximum distance (ascertained in the rule) between those defined that appear first and last. [
termA termB]-
number. Terms must appear as dictated by the rule; in regard to, the order and the maximum distance between the first and last of the defined terms (order matters).So, let's see how terms' count works with some examples:
Rule 1: [call|dial phone|telephone]~3
Rule 2: [automatic|automatically renew|renewal]-1
Multiwords can also be used within square brackets to include a word-to-word expression:
Rule 1: [rate|price|tax "go down"]-3
Notice only first term of the multiword is counted to calculate the distance. However, the whole multiword must appear.
Rule 2: ["in advance" payment|pay]~3
Rule 3: [I "my money" back]~4
Let's see some examples of these operators in action:
Rule | Text | Detects |
---|---|---|
[zone|space|section|area parking|child|children's|smoking|non-smoking|play|reserved]~5 | The zone has a parking nearby. | |
[zone|space|section|area parking|child|children's|smoking|non-smoking|play|reserved]~5 | The hotel parking is a space nearby. | |
[zone|space|section|area parking|child|children's|smoking|non-smoking|play|reserved]~5 | The assigned parking is around the corner, in a space behind the hotel. | |
[speak fast|rapid|quick|quickly|slow]-2 | He speaks fast. | |
[speak fast|rapid|quick|quickly|slow]-2 | He speaks English very fast. | |
[speak fast|rapid|quick|quickly|slow]-2 | He speaks very slowly. | |
[give|have "what I asked|ordered"]-3 | They don't have what I ordered. | |
[give|have "what I asked|ordered"]-3 | They don't have your purchase or what I ordered. |
If you are not sure on how exactly to count steps in your test, you can always use the test parsing console to check how the engine tokenizes the phrase.
Regular expressions are allowed. However, you will have to escape certain characters such as +, ( ), *, and % by using "\".
Below you can find some useful regular expressions:
Regex | Description | Rule | Detects |
---|---|---|---|
.*_term | Matches the chosen term inside a parsing multiword. | .*_president | The former president will visit us tomorrow. |
[0-9.,]* | Matches any number, including decimal and whole numbers. A space after the number may appear or not. | "[0-9.,]* ?€" | The bill was 30€. The bill was 30.5€. The bill was 30,5€. The bill was 30,5 €. |
[list of letters] | Matches any of the letters inside the square brackets. | apologi[sz]e | Please, apologize Please, apologise |
? | The letter may appear or not. | behaviou?r | Good behavior Bad behaviour |
Question marks can be used with semantic or grammatical aspects between brackets, for example, (@N)? meaning "any noun may appear or not". By contrast, question marks cannot be combined with brackets to include or exclude a letter set: "beer batter(ed)?"
There are two external elements that can be used in the rules:
{my_macro_label} AND words |
#
. This is specially useful to deal with coappearances of several categories and disambiguate depending on the context.
#catA AND context |
Now that we've seen all the different operators and expressions that can be used in a rule, let's see some examples of how they can be combined.
Rule | Text | Detected? |
---|---|---|
[difficult|easy to get to]~4 | It's very difficult to find the station and get to the platform. | |
[difficult|easy to get to]~4 | It's easy to get to the airport. | |
[zone|section|area parking|child|children's|play]~5 | The bar had a zone where children could play. | |
[zone|section|area parking|child|children's|play]~5 | You can play in the designated zone. | |
[speak|talk fast|rapid|quick|quickly|slow]-2 | He speaks fast. | |
[speak|talk fast|rapid|quick|quickly|slow]-2 | He speaks English really fast. | |
[speak|talk fast|rapid|quick|quickly|slow]-2 | That rapid talk is hard to follow. | |
instruction|menu|configuration|process AND easy|difficult | Laptop configuration was quite easy. | |
"impossible to_get to" OR "impossible 2 get 2" | It's impossible to get to the station. | |
"impossible to_get to" OR "impossible 2 get 2" | It's impossible 2 get 2 the station. | |
[obtain|achieve|reach|get "[0-9.,]* euros|dollars"]~4 AND NOT discount | I got a discount | |
[obtain|achieve|reach|get "[0-9.,]* euros|dollars"]~4 AND NOT discount | I obtained 20 euros | |
[play chess \?]-4 | Does he play chess? | |
S@Top>Location>GeoPoliticalEntity>City::_multiS | He lives in London | |
S@Top>Location>GeoPoliticalEntity>City::_multiS | That London boy is a great writer |
Punctuation signs liable to be used in regular expressions or in reference other elements must be escaped when used in rules, for example, [play football \?]-4 or [\#blacklivesmatter T@Top>Society>Politics]~10