When different people are voicing different issues, they will use different words and sentiments. Current analytics typically identifies just the keywords used, but this runs the danger of failing to miss the entire context behind the communication. Often a customer will merely imply a sentiment or intention instead of explicitly expressing it with specific keywords e.g. a customer in a restaurant might say ‘by the time my meal arrived, the food was cold’, the keyword would be flagged as ‘cold food’, when in fact the main issue was the slow service. There are also other limitations with using just keywords such as sarcasm, context, comparatives and local dialect/slang. The overarching message can often be missed and so the alternative is to analyse text data using ‘concepts’ instead of ‘keywords’.

Machine learning can help by providing the missing link between the keywords used, thus giving a much clearer picture of the full concept behind a piece of customer feedback. However, it’s the latest AI technology that is taking this a step further and making the classification of concepts more accessible for all.

Using Concepts for Sentiment Analysis

When using concepts instead of keywords, each concept has an associated sentiment, graded or otherwise. You can then do what you like with this e.g. tot them all up or filter for certain topics. For example, if you have the sentence: “the food was great, and I loved the selection of drinks, but the service was terrible and I’m never going back” you have two markers of positive, one of negative and a clear negative intent.  Sentiment can be presented as an overall picture across customer interactions. In the example given above, the overall sentiment may be positive, or neutral, but the actionable insight is apparent in the negative feedback and intent. Overall sentiment scores can give a skewed view of how customers really feel about a service, and real change can only be affected by focusing on negative sentiment and insight. Using concepts instead of keywords provides insight into what the most negative areas to improve are, without clouding this by letting positive or neutral sentiment interfere.

Comparably finding the sentiment using just keywords for the same sentence isn’t straightforward. Essentially it is driven by a black-box algorithm which uses various techniques to try to match the opinion terms close to the keyword terms (from -1 to 1) and then combines them in a way within sentences. In addition to this, the way people talk about things in reality, with sarcasm, negatives and context, can present a lot of challenges that many people who use keywords overlook. Sarcasm often gives a completely opposite sentiment score to that which the customer intended.

These limitations have generated a demand from businesses and analysts for accurate, actionable concepts i.e. those which capture the topic and intrinsically the emotion attached.

Machine learning alone is not enough

Machine learning is helping to solve many of the issues associated with the use of keywords in sentiment analysis and it does this by classifying by ‘concept’ instead of ‘keywords’ and rules. It focuses on the concepts of issue with the emotion attached and can classify more objectively. For example, if someone contacts a bank and says that they’ve had money taken from their account then the fact or concept that they have been subject to fraud is captured no matter what dialect, terminology or tone they use. The concept itself will scored as a negative sentiment and more importantly the bank now has actionable insight and it can deal with the customer appropriately. The sentiment can also be graded by concept, so rather than having a black-box score, it can be categorised into “very unhappy”, “unhappy”. The customer might also state their intent, e.g. leave the bank, or not recommend it. These intents can also be captured and can be very useful markers in terms of prioritising how the bank chooses to deal with them. Furthermore, the bank doesn’t have to physically read the review again. The concepts have been identified and any input or action required is complete.

However, whilst machine learning has the power to classify by concept, setting up traditional machine learning models and running them is difficult.

Firstly, you need to ‘predict’ on the same dataset you ‘train’ on. There are different phrases, words and even characters used, depending on the platform you are analysing e.g. the language used on Twitter is not the same as written complaints.

You also need to build the training set, tune it and constantly update it and this needs to be done by Data Scientists.

Finally, if your machine learning model doesn’t lead to actions, or is not fit for purpose because of the volume of false positives or negatives then it can fail to be operationalised. Unfortunately, it is all too common that there is a disconnect between data science and how it is judged by others in a business.

AI is making concept classification easier and faster

The latest in AI text analytics is taking the advancements made by machine learning and classifying by concepts instead of keywords one step further.

It automatically and accurately ‘tags’ feedback using actionable concepts along with the intent of the customer through a combination of automation and ‘human-in-the-loop’ technology.

Human-in-the-Loop involves somebody validating and training the model when it needs it, typically on records that have high uncertainty. The AI will send an alert to a human asking them to validate something it might not be sure about and then trains and updates itself with the input. Importantly the validation doesn’t need to come from a Data Scientist, it can be a business user or anybody that is familiar with the operation.

The impact of this is huge. Because of the automation it means that far fewer, non-data scientists are required to run the model and the validation that is invited by the AI can be done offline.

Complex models that would normally take weeks to build by a data scientist and with an ongoing overhead to support and curate, can now be generated and tuned by a non-data scientist in less than a day.

Performance using concepts

An example of the latest AI, namely PrediCX software from Warwick Analytics, was used to research publicly available customer data on Trip Advisor and Twitter, comparing the outputs for using both concepts and keywords.

The precision rate for labelling keywords was 58% compared to 76% for concepts. Precision here refers to how often the identified topic, or sentiment, was correct – a true positive. For example, if a classifier identified 10 mentions of ‘bad service’, but 3 of them actually mentioned good service – your precision is 70%.

The recall rate for keywords was unknown as there were 46% of records which were completely unlabelled. For concepts the recall rate was clear and it was 51%. There were no records unlabelled. Recall here refers to what percentage of the relevant data in a set were labelled. To refer to the example above, if the dataset mentioned ‘bad service’ 10 times in total, and the classifier picked up 7 mentions of ‘bad service’, then the recall is 70%.

It’s clear that measuring the recall rate is a problem when using keywords.

When it came to the intents of the customer, the keyword precision was 68% although it was 54% if you took into account the difference between recommendation and saying they would return. Intents here refer to either emotional intents (disappointed, repeat problem, etc) or sales intents (would recommend, would not return, etc). There were also a lot of inconsistencies e.g. where both ‘recommending’ and ‘quitting’ were predicted together as well as duplication by 14% i.e. double counting. When estimating the recall, the keywords picked up intents in 23% of records, albeit with the accuracy noted above, so in fact around 13% to 16%.

However, when using concepts, the intents were picked up 28% of the time. The precision was 88% and recall 45%.

For keywords it labelled 23% of records as “recommending” but less than 1% as “quitting”. For concepts, it labelled positive intents 23% and negative intents as 5%. This meant that it was significantly better (1000%) at picking up negative intents and arguably these carry the most insight.

When it comes to Twitter, the average number of labels per record was 2.3 with keywords but only 1.05 with concepts. This is important as a Tweet usually has one intent and so keywords become less useful for picking this up as much of the verbiage will be context.

We can see therefore that performance for keywords is driven by how well defined the taxonomy is. Even so, subtle nuances can return inaccurate classification. Performance isn’t explicit and has to be manually measured. The performance for concepts relies on an algorithm and some training data applying to the domain concerned. It can be tuned and may or may not require intervention from a data scientist.

It’s also worth asking what a ‘good’ performance is as a benchmark. We can do this by comparing against what humans could manually achieve. This can range anywhere from 70% to 90% precision but with much lower recall (it’s easier to miss things than to correctly or incorrectly judge something). Another benchmark is to have a reliable self-measurement metric, noting that some labels will have better performance than others, mostly likely the larger classes and/or those with less variable description possibilities.

Conclusion

There are a range of options for analysing text. Keyword analysis solutions are widely-available and ready to deploy and are certainly fit to serve basic text analysis needs. Keywords can serve to highlight, on a less detailed level, areas of a business that need attention. However, for more detailed, actionable insight, classifying by concepts, particularly with the latest in AI text analytics, is far and away the better option. Accuracy is not only explicit but can be improved with very little work.

Machine Learning is helping to make the use of classification by concepts more widespread but there are still barriers to its’ adoption, mainly the resources required to build and maintain the models. AI alleviates many of these barriers through automation and only using human intervention when required, thus using concepts instead of keywords is likely to become more accessible for all.