Text Classification in Excel: getting started

As you probably already know, Excel spreadsheets are one of the most extended ways of working with big collections of data. They are powerful and easy to combine and integrate with a myriad of other tools. Through our Excel Add-in, we enable you to add MeaningCloud’s analysis capabilities to your work pipeline. The process is very simple as you do not need to write any code.

In this tutorial, we are going to show you how to use our Excel Add-in to perform text classification. We are going to do so by analyzing restaurant reviews we’ve extracted from Yelp. If you have already read some of our previous tutorials, this first part may sound familiar.

To get started, you need to register in MeaningCloud (if you haven’t already), and download and install the Excel add-in on your computer. Here you can read a detailed step by step guide to the process.

Once you’ve installed it, a new tab called MeaningCloud will appear when you open Excel. If you click on it, you will see the following buttons:

excel add-in ribbon

To start using the add-in, you need to copy your license key and paste it into the corresponding field in the Settings menu. You are required to do this only the first time you use the add-in, so if you have already used it, you can skip this step.

Once the license key is saved, you are ready to start analyzing!

As mentioned above, we are going to analyze restaurant reviews extracted from Yelp, more specifically reviews on restaurants in London. You can download the spreadsheet we are going to use here and follow the tutorial as you read.

You will see that after opening the spreadsheet with your data in Excel, you will be able to analyze them in just a few clicks.

Step 1: open the Text Classification interface

To do this, just click on the “Text Classification” button, the first one on the left. It will open the Text Classification interface, which you can see on the image on the right.

The interface has two different sections: “Input“, which allows us to select the data to analyze and “Analysis settings“, to configure specific parameters for the analysis.

Step 2: select the data to analyze

In the spreadsheet we are using, there are two columns with data:

  • Column A, with the names of the reviewed restaurants.
  • Column B, with the texts of the reviews.

The data we want to analyze is in column B, and goes from row 2 to row 76. The interface permits you to select the data in different ways: you can write the range manually, use Excel’s range selector, or pre-select the range before opening the analysis interface, so use whichever suits you best.

text-classification

The results will be output in a new sheet, so it is important not to lose each review’s reference to its correspondent restaurant. To do this, we will use the second field in the “Input” section: the IDs.

tc-excel-selected-input

Click on the checkbox to enable it and then select a cell of the column you want to move in association with the texts we are going to analyze. In this case, it can be any cell of column A.

The image on the left shows how the “Input” section looks after selecting the data as described.

Step 3: configure the analysis

The second section, “Analysis Settings” features two configurable fields: the language in which we are going to analyze the text and the classification model we are going to use.

In our case, we want to analyze texts in English, so that’s the value we will set in the “language” field, but which classification model should we use?
For English there are three models available:

  • IAB, a standard for the advertisement industry.
  • IPTC, a standard for coding the metadata of news articles.
  • SocialMedia, a simple taxonomy to classify social media content.
tc-excel-analysis-settings

If we take a look at the categories of each one of these models, we can see that both IPTC and SocialMedia do not have specific categories for food or restaurants. The closest ones in both cases are the categories related to “lifestyle and leisure“, but this does not give us much additional information for our scenario.

On the other hand, the IAB model has a Tier 1 category called Food & Drink, which includes some subcategories related precisely to the domain we want to analyze. It seems clear that this is the best model for our analysis out of the three provided.

Once we have set all the desired values, the only thing left to do is to click on the button “Analyze” to start the analysis.

When you do this, a progress bar will appear to show you the progress of the analysis. In the background, you will be able to see how a new sheet with the results is created and how the values are written as they are received from the API.

When the process finishes, your excel spreadsheet will have a new sheet called Text Classification with the result of the classification according to the selected model. You can read more about the output and how to configure it in the text classification in excel documentation. For this tutorial, we’ve used the default configuration, which shows in the output all the possible fields and selects just one category per text analyzed.

Step 4: analyze the results

Now, with this data, you can carry out any analysis you want.

For instance, we have decided to create a pivot chart (you can do it by clicking the Insert ribbon).

In this chart (we have selected a bar chart), if we select the fields ID and Label in the “PivotTable Field List” menu and then we drag the Label field to the Values areas, we will immediately obtain a graphic that shows us the distribution of the different categories detected for the restaurant they refer to.

After changing the default colors to visualize better the data (the darkest bar marks the most frequent category for each restaurant), we obtain the chart on the right.

We can see that for some restaurants we do obtain relevant information about their type (“Rocca” is clearly an Italian restaurant while “Molé Taco Bar” is a Mexican one).

You can download the spreadsheet with the results, the analysis and the pivot chart here.

But what can you do if the predefined classification models provided are not relevant to what you want to classify or do not provide enough detail? Stay tuned for our next tutorial, in which we will show you how to define with MeaningCloud your own classification model and use it in Excel.

And, of course, if you have any questions, we’ll be happy to answer them at support@meaningcloud.com


Leave a Reply

Your email address will not be published. Required fields are marked *

*
*