At MeaningCloud we aim to provide the most advanced text analytics product with the broadest language coverage in the market. That’s why before we finished 2019 we worked on launching several new language packs to increase the coverage given by our standard pack — English, Spanish, French, Italian, Portuguese and Catalan — and our Nordic pack — Swedish, Danish, Norwegian and Finnish.
The third pack we launched is the Arabic pack. Arabic, the fifth most spoken language in the world, is the official language in twenty countries and co-official in six others. It is the first language of 280 million speakers, and the second language of another 250 million. Moreover, for religious reasons, several million Muslims living in other countries have knowledge of Arabic.
Its most peculiar characteristic is that it uses its own writing system, from right to left, joining the letters together. In this way, each letter can have up to four forms. It is also interesting that, despite the fact that they were introduced in the 1920s, there are no capital letters in Arabic. Since sometimes common names can be confused with proper names, the latter are usually enclosed in parentheses or quotes.
MeaningCloud now provides coverage for Arabic for the following functionality:
- Topics Extraction: covers the detection of entities and, partially, expressions of time.
- Text Clustering: full coverage.
This coverage will be extended through the successive product releases depending on the market demand. Find detailed information on our new language coverage page.
So, what are these text analytics tasks and what are they used for?
Continue reading