Unstructured content is growing. Turn it into actionable insights
The "natural" raw material of communication between people is not structured data but unstructured content in the form of free-format text, images, audio and video. It is commonly accepted that 80% of business-relevant information is unstructured, mainly text, and this unstructured content is growing much faster than structured data.
Despite its huge potential as a source of valuable insights, open text in particular is rarely analyzed or used in decision-making, since manually reading and extracting insights is tedious and expensive in the best-case scenario, and impossible in the worst considering the large amount of information. To overcome this challenge, text analytics technologies automatically process and analyze textual content and provide valuable insights, transforming this "raw" data into structured, usable information.
What is Text Analytics?
Why is text analytics more important than ever?
Where can it be employed?
- Organizations of all kinds need to understand the people they interact with. In businesses, this is known as Voice of the Customer or Customer Experience Management. The massive, automatic processing of the unstructured information contained in surveys, contact center records, and comments on social media provides a 360 degree view of their customers. In the case of public administrations (such as city councils) and other political organizations, this scenario is called Voice of the Citizen or Voice of the Voter.
- One area of application that overlaps with the previously mentioned one to some extent is media monitoring and analysis, especially of new social media, but also traditional ones, given that the information analyzed may be generated both by (potential) clients and by reporters, analysts, and influencers.
- Additionally, when we analyze an organization's internal community, we are talking about Voice of the Employee applications geared toward Talent Management.
- In scientific research, text analytics is used to mine large volumes of articles and other documents, identify relationships, and facilitate information retrieval.
- Media and publishers use it to make the most of their archives, produce higher-quality content more quickly, engage the audience through personalized content and monetize their production through targeted advertising and new business models.
- In the fields of justice and crime prevention, in Compliance and eDiscovery applications, it is used to process documents and communications automatically in order to disclose clues of potentially criminal behaviors, e.g. insider trading or fraud.
- Organizations in the areas of health, law, etc. use it to automatically code and analyze records for a better categorization, mapping, and exploitation.
Typical tasks of text analytics
- Part-of-speech tagging (or PoS tagging) consists in identifying the structure of a text and assigning a grammatical category to each word, depending on the context.
- Clustering allows you to discover the relevant topics and the relationships within a collection of documents by grouping them into sets according to similarity. It is especially useful in exploratory applications, in which the aim is to uncover topics that were not predefined, similarities, or duplicates.
- Classification or categorization, consists in assigning a text to one or more categories out of a predefined taxonomy taking the text's global content into account. In general, it requires a previously configured and trained classification model built in line with the selected taxonomy. Classification is used to identify the themes discussed in the text as a whole.
- Information extraction identifies entities (names of people, places, companies and brands), abstract concepts, and other specific elements: amounts, relationships, etc. It is used to detect mentions and identify the most meaningful elements of a text.
- Sentiment analysis detects the polarity (positive, negative, neutral, or no polarity) contained in a document. This polarity can be found as a subjective opinion or the expression of an objective fact. In addition to the global polarity at document level, it is possible to carry out a more granular analysis and identify the polarity associated with different aspects or attributes mentioned in the same document.
What determines the quality of text analytics?
Advantages of automatizing text analytics
- Volume, scalability - Manual processing does not scale properly when the volume of the texts to analyze increases: its unit costs increase with volume. This is unacceptable in a world where the amount of unstructured content increases at an exponential rate. By contrast, automated tools can process virtually unlimited volumes with increasingly lower costs.
- Homogeneity, standardization - Human analysts are also subject to errors due to the ambiguity of language. Moreover, these errors and the applied criteria depend on the individual (and even on his/her situation) producing inconsistencies that are difficult to prevent. Although the accuracy given by automatic analytics might initially be lower, its bias is homogeneous and therefore easier to counteract. Furthermore, an automatic tool always applies consistent criteria and procedures, providing more homogeneous results.
- Availability - Automatic tools are always available, which makes the presence of specific individuals at specific times unnecessary.
- Low latency - Automatic procedures respond in milliseconds (even with high volumes), which enables decision-making and action in near real-time.
- Quality - With a proper adaptation to the application environment, automatic tools can achieve precision and recall parameters comparable to human processing.
What is the relationship between text analytics and cognitive computing?
What features should a good text analytics solution have?
- Completeness: It must feature a wide range of functions to implement text analytics tasks.
- Integrability: It should be easy to integrate into systems, applications, and current user processes. This can be achieved through open interfaces and a repertoire of SDKs and plug-ins that are compatible with different languages and systems.
- Customization: It should facilitate its own adaptation to the application domain to optimize accuracy.
- Low risks and costs: It has to include tried and true technologies, it should not require large investments or commitments, and it must be affordable.