Category Archives: Language Technology

Posts about language technology.

#ILovePolitics: Popularity analysis in the news

If you love politics, regardless of your party or political orientation, you may know that election periods are exciting moments and having good information is a must to increase the fun. This is why you follow the news, watch or listen to political analysis programs on TV or radio, read surveys or compare different points of view from one or the other side.

American politics in a nutshell

American politics

Starting with this, we are publishing a series of tutorials where we will show how to use MeaningCloud for extracting interesting political insights to build your own political intel reports. MeaningCloud provides useful capabilities for extracting meaning from multilingual content in a simple and efficient way. Combining API calls with open source libraries in your favorite programming language is so easy and powerful at the same time that will awaken for sure the Political Data Scientist hidden inside of you. Be warned!

Our research objective is to analyze mentions to people, places, or entities in general in the Politics section of different news media. We will try to carry out an analysis that can answer the following questions:

  • Which are the most popular names?
  • Does their popularity depend on the political orientation of the newspaper?
  • Is it correlated somehow to the popularity surveys or voting intentions polls?
  • Do these trends change over time?

Before we begin

This is a technical tutorial in which we will develop some coding. However, we will try to guide you through the whole process, so everyone can follow the explanations and understand the purpose of the tutorial.

For the sake of generality and better understanding, we will focus on U.S. Politics in English, but obviously you can easily adapt the same analysis for your own country or (MeaningCloud supported) language.

And last but not least, this tutorial will use PHP as programming language for the code examples. However, any non-rookie programmer should be able to translate the scripts into any language of their choice.

Continue reading


New insights in your contents with the new release of MeaningCloud

We have just published a new release of MeaningCloud with some new features that will change your way of doing text analytics. As a complement to the most common analytical techniques -which extract information or classify a text according to predefined dictionaries and categories- we have included unsupervised learning techniques that enable to explore a series of documents to discover and extract unexpected insights (subjects, relationships) from them.

In this new release of MeaningCloud we have published a Text Clustering API that allows to discover the implicit structure and the meaningful subjects embedded in the contents of your documents, social conversations, etc. This API takes a set of texts and distributes them in groups (clusters) according to the similarity between the contents of each document. The aim is to include in each cluster documents that are very similar to each other and, at the same time, highly different from the ones included in other clusters.

Clustering is a technology traditionally used in the analysis of structured data. What is so special about our API is that its pipelines are optimized for analyzing unstructured text.

Text Clustering API

Continue reading


Could Antidepressants Be the Cause of Birth Defects?

We agree that it is not typical at all for an Information Technology company to talk about antidepressants and pregnancy in its own blog. But here at MeaningCloud we have realized that health issues have a great impact on social networks, and the companies from that industry, including pharmas, should try to understand the conversation which arises around them. How? Through text analysis technology, as discussed below.

Looking at the data collected by our prototype for monitoring health issues in social media, we were surprised by the sudden increase in mentions of the term ‘pregnancy’ on July 10. In order to understand the reason of this fact, we analyzed the tweets related to pregnancy and childbearing. It turned out that the same day a piece of news on a study issued by the British Medical Journal about the harmful effects that antidepressants can have on the fetus had been published.
Continue reading


Is Cognitive Computing too Cool to Be True?

According to IBM, “Cognitive Computing systems learn and interact naturally with people to extend what either humans or machines could do on their own. They help human experts make better decisions by penetrating the complexity of Big Data.” Dharmendra Modha, Manager of Cognitive Computing at IBM Research, talks about cognitive computing as an algorithm being able to solve a vast array of problems.

With this definition in mind, it seems that this algorithm requires a way to interact with humans in order to learn and to think as they do. Nice, great words! Anyway, it is the same well-known goal of Artificial Intelligence (AI), a more common name that almost everybody has heard about. Why change it? Ok, when a company is investing at least $1 billion in something, it must be cool and fancy enough to draw people’s attention, and AI is quite old-fashioned. Nevertheless, machines still cannot think! And I believe it will take some time.

How does Cognitive Computing work? According to the given definition, to enable the human-machine interaction, some kind of voice and image processing solutions must be integrated. I am not an expert on image processing, but voice recognition systems, dialog management models and Natuking-640388_1280ral Language Processing techniques have been studied for a while. Even Question Answering methods (i.e. the ability of a software system to return the exact answer to a question instead of a set of documents as traditional search engines do) have been deeply studied. We ourselves have been doing (and still do) research on this topic since 2007, which resulted in the development of virtual assistants, a combination of dialogue management and question answering techniques. Do you remember Ikea’s example called Anna? In spite of the fame she gained at that time, she is not working anymore. Perhaps, for users, that kind of interaction through a website was not effective enough. On the other hand, virtual assistants like Siri, supported by an enormous company as Apple, are gaining attention. There are other virtual assistants for environments different from iOS but they are far less known, perhaps because the companies behind them are quite smaller than Apple.

Several aspects of the thinking capabilities required by the mentioned algorithm have to do with the concept of Machine Learning. There are a lot of well-known algorithms which are able to generate models from a set of examples or even from raw data (in the case of unsupervised processes). This enables a machine to learn how to classify things or to group items together, like a baby piling up those coloured geometric pieces. So, combining Machine Learning and NLP models it is possible for a machine to understand a text. This process is what we call Structuring Unstructured Data (much less fancy than Cognitive Computing). That is, making your information actionable. We have been working on this during several years, but now it is called cognitive computing.

So, as you might imagine, Cognitive Computing techniques are not different from the ones we have already developed; a lot of researchers and companies have been combining them. And, if you think about it, does it really matter if a machine thinks or not? The relevant added value of this technology is helping humans to do their job with all the relevant information at hand, at the right moment, so they can make thoughtful and reasonable decisions. This is our goal at MeaningCloud.


Exploring Social Media for Healthcare Data

People enjoy sharing information through social media, including healthcare data. Yeah, it is true! And it constitutes the starting point of the research work titled ‘Exploring Spanish health social media for detecting drug effects’, which aims at following social media conversations to identify how people talk about their relation with drug consumption. This allows identifying possible adverse effects previously unknown related to these drugs. Although there is a protocol to communicate to the authorities the identification of a drug adverse effect, only a 5 – 20% of them are reported. Besides, conversations around drugs, symptoms, conditions and diseases can be analyzed to learn more about them. For example, it is possible to see how people search for specific drugs using social media, while others sell them, perhaps illegally. Many others talk about mixing alcohol with drugs or other illegal substances. Of course, one cannot believe everything that appears on the Internet this is another issue—, but it can highlight some hypothesis for further research.

drugs

Some researchers from the Advanced Databases Group at Carlos III University of Madrid have carried out the mentioned study, designing hybrid models to capture the needed knowledge to identify adverse effects. The Natural Language Processing platform which supports the implementation of the analysis process based on such models is MeaningCloud. The customization capabilities provided by the platform have been decisive to include specific vocabulary and medical domain knowledge. As we know, the names of drugs and symptoms might be complex and, in some cases, difficult to write properly. The algorithm’s results are promising, with a 10% increase in recall when compared to other known algorithms. You can find further details in the scientific paper published by the BMC Medical Informatics and Decision Making Journal.

These developments have been part of the TrendMiner project, and are now available in the prototype website TrendMiner Health Analytics Dashboard, which shows people’s comments about antidepressants gathered from social media. The console displays the mentions of antidepressants and related symptoms and, by clicking on any of them, their evolution over time. Moreover, the source texts analyzed to compute those mentions are shown at the bottom, with labels highlighting the names of drugs, symptoms or diseases, and any relations among them. Such relations might say if a drug is indicated for a symptom or if a disease is an adverse effect of the mentioned drug. The prototype also allows searching by the ATC code (Anatomical Therapeutic Chemical Classification System) and the corresponding level according to this classification scheme. So, if you mark the ‘By Active Substance’ selector, you are searching any drug containing the active substance of the product you inserted in the search box. Furthermore, the predictive search functionality makes easier to find the right expression for a drug or disease.

Health and pharma companies can exploit their unstructured information

There are new kinds of data that are specific to the healthcare and pharmaceutical industries (such as electronic health records) as well as data science tools that allow us to extract valuable knowledge from that data.

 

With MeaningCloud, it is possible to identify the costs of medical treatments, their efficiency (cost, benefits, and risks), references to drugs, side effects, or long-term results. That is why our text analytics solution for the healthcare and pharma domains has so much potential.


Migrate from Textalytics: Spellchecker and Language Analysis API

We have just published and updated in MeaningCloud two of the functionalities that were still pending to migrate from Textalytics.

  • Automatic text proofreading checks spelling, grammar and style in your texts for several languages: Spanisn, English, French and Italian.
  • Full language analysis including lemmatization, Part of Speech tagging and syntactic analysis also for several languages. For this API besides English, Spanish, French and Italian we have also Portuguese and Catalan available.

Textalytics users can access MeaningCloud using the same email and password they already had. If you do not remember your password, you can reset and generate a new password.

Developers using Textalytics’ Spell, Grammar and Style Proofreading API or Lemmatization, POS and Parsing API

If you are a user of the following functionalities and want to migrate to MeaningCloud, you can do it already. You only have to:

  1. Update the access point, since the request and response format does not change. Both HTTP and HTTPS endpoints are available.
    API Textalytics MeaningCloud
    Spell, Grammar and Style Proofreading
    https://textalytics.com/core/stilus-1.2
    http://api.meaningcloud.com/stilus-1.2
    Lemmatization, POS and Parsing API
    http://textalytics.com/core/parser-1.2
    http://api.meaningcloud.com/parser-1.2
  2. Check your license key in MeaningCloud and make sure that you use the correct (and only) license as the value of the parameter ‘license key’ on all requests. You can copy your license key either from the Licenses section in the Account menu, or from the developers home.

As always, if you have doubts or find any other problem, do not hesitate to write us to support@meaningcloud.com. Nevertheless, in order to ensure a smooth transition for client applications all the Textalytics’ API endpoints will be operational until June 1st, 2015.


Text Analytics market 2015: Seth Grimes interviews MeaningCloud’s CEO

Seth GrimesSeth Grimes is one of the leading industry analysts covering the text analytics sector. As part of his annual year-past/look-ahead report on this technology and market developments, Seth polled a group of industry executives, asking their thoughts on the state of the market and prospects for the year ahead.

José Carlos González, CEO of Daedalus / MeaningCloud, was one of the selected executives. In the interview, Seth and José Carlos discuss industry perspectives, technology advances and the “breadth vs depth” dilemma faced by many text analytics vendors.

This is an excerpt from the interview:

Roads to Text Analytics Commercialization: Q&A with José Carlos González, Daedalus

What should we expect from your company and from the industry in 2015?

Voice of the Customer (VoC) analytics — and in general, all the movement around customer experience — will continue being the most important driver for the text analytics market.

The challenge for the years to come will consist in providing high-value, actionable insights to our clients. These insights should be integrated with CRM systems to be treated along with structured information, in order to fully exploit the value of data about clients in the hands of companies. Privacy concerns and the difficulties to link social identities with real persons or companies, will be still a barrier for more exploitable results.

———

Interested? Read the rest of the interview –featuring market developments and company and product strategies- on Seth Grimes’ blog.


Emergency Management through Real-Time Analysis of Social Media

Serving citizens without paying attention to social media?

App Llamada Emergencias

The traditional access channels to the public emergency services (typically the phone number 112 in Europe) should be extended to the real-time analysis of social media (web 2.0 channels). This observation is the starting point of one of the lines which the Telefónica Group (a reference global provider of integrated systems for emergency management) has been working in, with a view to its integration in its SENECA platform.

Social dashboard for emergency management

At Daedalus (now MeaningCloud) we have been working for Telefónica in the development of a social dashboard that analyzes and organizes the information shared in social networks (Twitter, initially) before, during and after an incident of interest to emergency care services. From the functional point of view, this entails:

  • Collecting the interactions (tweets) related to incidents in a given geographical area
  • Classifying them according to the type of incident (gatherings, accidents, natural disasters…)
  • Identifying the phase in the life cycle of the incident (alert or pre-incident, incident or post-incident)

Benefits for organizations that manage emergencies

Love Parade Duisburg

Love Parade Duisburg

Anticipate incidents

Anticipation of events which, due to their unpredictability or unknown magnitude, should be object of further attention by the emergency services. Within this scenario are the events involving gatherings of people which are called, spread or simply commented through social networks (attendance to leisure or sport events, demonstrations, etc.). Predicting the dimensions and scope of these events is fundamental for planning the operations of different authorities. We recall in this respect the case of the disorders resulting from a birthday party called on Facebook in the Dutch town of Haren in 2012 or the tragedy of the Love Parade in Duisburg.

Flood in Elizondo, Navarre, 2014

Flood in Elizondo, Navarre, 2014

Enrich the available information

Social networks enable the instant sharing of images and videos that are often sources of information of the utmost importance to know the conditions of an emergency scenario before the arrival of the assistance services. User-generated contents can be incorporated to an incident’s record in real time, in order to help clarify its magnitude, the exact location or an unknown perspective of the event.

 

 

Text Analytics technology

Logo MeaningCloud

For the analysis of social content, the text analytics semantic technology (text mining) of MeaningCloud is employed. Its cloud services are used to:

  • Identify the language of the message
  • Classify the message according to a taxonomy (ontology) developed for this scenario (accidents of various kinds, assaults, natural disasters, gatherings, etc.)
  • Extract the mentioned entities (names of people, organizations, places) and the message’s relevant concepts
  • Identify the author or transmitter of each tweet.
  • Extract the geographic location of the transmitter and the incident
  • Extract the time of the message and the incident
  • Classify the impact of the message
  • Extract audiovisual (pictures and videos) and reference (links to web pages, attached documents…) material mentioned in the tweet for documenting the incident
  • Group automatically the messages relating to a same incident within an open record
  • Extract tag clouds related to incidents

Twalert Console

Twalert ConsoleA multidimensional social perspective

Text analytics components are integrated into a web application that constitutes a complete social dashboard offering three perspectives:

  • Geographical perspective, with maps showing the location of the messages’ transmitters, with the possibility of zooming on specific areas.
  • Temporal perspective: a timeline with the evolution of the impact of an incident on social networks, incorporating sentiment analysis.
  • Record perspective: gathering all the information about an incident.

Twitter Accidente Trafico

LT-Accelerate

Telefónica and Daedalus (now MeaningCloud) at LT-Accelerate

Telefónica and Daedalus (now MeaningCloud) will jointly present these solutions at the LT-Accelerate conference (organized by LT-Innovate and Seth Grimes), which will be held in Brussels, on December 4 and 5, 2014. We invite you to join us and visit our stand as sponsor of this event. We will tell you how we use language processing technologies for the benefit of our customers in this and other industries.

 

Register at LT-Accelerate. It is the ideal forum in Europe for the users and customers (current or potential) of text analysis technologies.

Telefonica_logo

 

 

 

 

 

Jose C. Gonzalez (@jc_gonzalez)

[Translation from Spanish by Luca de Filippis]


The Role of Text Mining in the Insurance Industry

What can insurance companies do to exploit all their unstructured information?

A typical big data scenario

Insurance companies collect huge volumes of text on a daily basis and through multiple channels (their agents, customer care centers, emails, social networks, web in general). The information collected includes policies, expert and health reports, claims and complaints, results of surveys, relevant interactions between customers and no-customers in social networks, etc. It is impossible to handle, classify, interpret or extract the essential information from all that material.

The Insurance Industry is among the ones that most can benefit from the application of technologies for the intelligent analysis of free text (known as Text Analytics, Text Mining or Natural Language Processing).

Insurance companies have to cope also with the challenge of combining the results of the analysis of these textual contents with structured data (stored in conventional databases) to improve decision-making. In this sense, industry analysts consider essential the use of multiple technologies based on Artificial Intelligence (intelligent systems), Machine Learning (data mining) and Natural Language Processing (both statistical and symbolic or semantic).

Most promising areas of text analytics in the Insurance Sector

Fraud detection

Detección de Fraude

According to Accenture, in a report released in 2013, it is estimated that in Europe insurance companies lose between 8,000 and 12,000 million euros per year due to fraudulent claims, with an increasing trend. Additionally, the industry estimates that between 5% and 10% of the compensations paid by the companies in the previous year were due to fraudulent reasons, which could not be detected due to the lack of predictive analytic tools.

According to the specialized publication “Health Data Management”, Medicare’s fraud prevention system in the United States, which is based on predictive algorithms that analyze patterns in the providers’ billing, in 2013 saved more than 200 million dollars in rejected payments.

Continue reading


The Analysis of Customer Experience, Touchstone in the Evolution of the Market of Language Technologies

The LT-Innovate 2014 Conference has just been held in Brussels. LT-Innovate is a forum and association of European companies in the sector of language technologies. To get an idea of the meaning and the importance of this market, suffice it to say that in Europe some 450 companies (mainly innovative SMEs) are part of it, and are responsible for 0.12% of European GDP. Daedalus is one of the fifteen European companies (and the only one from Spain) formally members of LT-Innovate Ltd. since its formation as an association, with headquarters in the United Kingdom, in 2012.

LTI_Manifesto_2014

LT-Innovate Innovation Manifesto 2014

In this 2014 edition, the document “LT-Innovate Innovation Manifesto:” Unleashing the Promise of the Language Technology Industry for a Language-Neutral Digital Single Market” has been published. I had the honor of being part of the round table which opened the conference. The main subject of my speech was the qualitative change experienced in recent times by the role of our technologies in the markets in which we operate. For years we have been incorporating our systems to solve in very limited areas the specific problems of our more or less visionary or innovative customers. This situation has already changed completely: language technologies now play a central role in a growing number of businesses.

Language Technologies in the Media Sector

In a recent post, I referred to this same issue with regard to the media sector. If before we would incorporate a solution to automate the annotation of file contents, now we deploy solutions that affect most aspects of the publishing business: we tag semantically pieces of news to improve the search experience on any channel (web, mobile, tablets), to recommend related content or additional one according to the interest profile of a specific reader, to facilitate findability and indexing by search engines (SEO, Search Engine Optimization), to place advertising related to the news context or the reader’s intention, to help monetize content in new forms, etc.

Continue reading