We recently participated in the Big Data Spain conference with a talk entitled “Real time semantic search engine for social TV streams”. This talk describes our ongoing experiments on social media analytics and combines our most recent developments on using semantic analysis on social networks and dealing with real-time streams of data.
Social TV, which exploded with the use of social networks while watching TV programs is a growing and exciting phenomenon. Twitter reported that more than a third of their firehose in the primetime is discussing TV (at least in the UK) while Facebook claimed 5 times more comments behind his private wall. Recently Facebook also started to offer hashtags and the Keywords Insight API for selected partners as a mean to offer aggregated statistics on Social TV conversations inside the wall.
As more users have turned into social networks to comment with friends and other viewers, broadcasters have looked into ways to be part of the conversation. They use official hashtags, let actors and anchors to tweet live and even start to offer companion apps with social share functionalities.
While the concept of socializing around TV is not new, the possibility to measure and distill the information around these interactions opens up brand new possibilities for users, broadcasters and brands alike. Interest of users already fueled Social TV as it fulfills their need to start conversations with friends, other viewers and the aired program. Chatter around TV programs may help to recommend other programs or to serve contextually relevant information about actors, characters or whatever appears in TV. Moreover, better ways to access and organize public conversations will drive new users into a TV program and engage current ones.
On the other hand, understanding the global conversation about a program is definitely useful to acquire insights for broadcasters and brands. Broadcasters and TV producers may measure their viewers preferences and reactions or their competence and acquire complementary information beyond plain audience numbers. Brands are also interested in finding the most appropriate programs to reach their target users as well as understand the impact and acceptance of their ads. Finally, new TV and ad formats are already being created based on interaction and participation, which again bolster engagement.
In our talk, we describe a system that combines natural language processing components from our MeaningCloud APIs and a scalable semi-structured database/search engine, SenseiDB, to provide semantic and faceted search, real-time analytics and support visualizations for this kind of applications.
Using MeaningCloud APIs we are able to include interesting features for Social TV like analyzing the sentiment around an entity (a program, actor or sportsperson). Besides, entity recognition and topic extraction allow us to produce trending topics for a program that correlate well with whatever happens on-screen. They work as an effective form to organize the conversation in real-time when combined with the online facets provided by SenseiDB. Other functionalities like language recognition and text classification help us to clean the noisy streams of comments.
SenseiDB is the second pillar of our system. A semi-structured distributed database that helps us to ingest streams and made them available for search in real-time with low query and indexing times. It includes a large number of facet types that enable us to use navigation using a range of semantic information. With the help of histogram and range facets it could even be overused for simple analytics tasks. It is well rounded with a simple and elegant query language, BQL, which help us to boost the development of visualizations on top.
If you find it interesting, check out our presentation for more detail or even the video of the event.