What emotion was intended, then?

Studies have shown even humans amongst each other frequently disagree about whether a sentence expresses a certain sentiment, and when it does, what type of sentiment it is. Ideally, we want to develop a system where the AI’s judgement agrees as much as possible with average human opinion. This has been the first and foremost design consideration when building our emotion recognition software.

A second objective when designing our automated classification system was consistency. If the AI does not agree with most humans when analysing a particular sentence, at least if we understand its reasoning, we may be able to understand where its judgements come from (this is the fundamental advantage of knowledge-based AI versus black box applications of AI using neural networks, or pure machine learning).

Our system marks a sentence as being related to a certain emotion if one of two conditions are met:

  • The emotion is explicitely described as such, in other words, ‘anger’ itself may be the topic of conversation
  • The speaker behind a sentence is revealing emotion, i.e. ‘anger’ is found to be in the words

Sentences whose subject is very often associated with a particular emotion, will not be automatically ascribed such an emotion. For example: ‘terrorism’ itself does not immediately express fear, though, in many cases it will and we want to detect that.

For a large part of the sentences in a big data corpus, our method already generates results with a strong accuracy. To further improve our system, we study boundary cases, where humans will often disagree amongst themselves on whether there is emotion present in the text. To further complicate matters and to challange ourselves, we choose topics where the specific types of emotions revealed are a matter of discussion too.

An image depicting three children, one of whom is bullying another. The third child is standing by and watches.

A case-study of online emotion recognition: #bullying

Let’s study one such problematic occurance. A Twitter debate on the subject of bullying, under the hashtag #bullying (in Dutch: #pesten) resulted in a sharp rise of the anger curve on our analytical demo chart. Even though bullying often angers people, and could indeed be conceived as an expression of anger in many situations, we immediatly recognized this could be the result of a too simple set of definitions inside our lexicon. The verb bullying itself was mapped directly to anger. The emotion recognition tool wasn’t sensitive enough to detect situations such as those described above, with the ‘terrorism’ example, where bullying was the topic of discussion, possible in a more detached, non-emotional fashion. And what about sadness? Isn’t the victim of bullying expected to primarily experience sadness instead of anger, like in the sentence: “He had been the victim of bullying in school.”?

Our dataset consisted of approximately 8.000 captured Dutch sentences from Twitter. This is what the original emotion-distribution chart looked like, before we started fine-tuning and improving the lexicon.

We then started investigating the data corpus by browsing the sentences in the database and looking for patterns, sorting by the number of retweets, the distinct emotions recognised, et cetera. We quickly discovered the hashtag #pesten had become trending due to the national Dutch anti-bullying day. As a result, many of the tweets about bullying came from schools and other institutions. Those tweets, while clearly denouncing the act of bullying, did so often in a non-emotional and informative style. They are clearly distinct from personal tweets, which more often relayed experiences of bullying.

As we spend time with the data, we systematically constructed a list of (multi-)words and so-called ‘negators’ (rules to create intelligent exceptions to would-be matches). We improved the distinction-capabilities of our matching system in the following regards:

  • Better detect non-emotional language surrounding bullying
  • Better detect descriptions of being the victim of bullying
  • Better detect genuine anger involved in the debate

The end result has been a substantial shift in the emotion-distribution chart.

A large part of the sentences formerly classified as expressing anger, are now seen to express sadnes, or sometimes multiple emotions at the same time. Another substantial part of the sentences in the corpus is no longer perceived to express any of the primary emotions (40% of the sentences which measured anger are now seen as having neutral sentiment).

For example, the following tweet expressed anger in the original analysis:

But after our fine-tuning, it will be more correctly interpreted as not expressing emotion per se.

The next sentence expressed anger as well, according to our API.

But now the presence of a victim of bullying is correctly detected, and sadness becomes assigned to this sentence.

It should be noted we don’t expect to see such major shifts in analytical results after any fine-tuning process. The big difference here is explained by the fact that we remedied an earlier oversimplification and the fact that our dataset was very uniform – in the sense that all tweets mentioned the same keyword.


Any fine-tuning of the lexicon to tackle a specific problem peripherally improves the accuracy of the system as a whole. During the process of investigating what happened when with bullying, we added new positive matches applicable to a much wider domain. InterTextueel can assist you in creating a customised lexicon for your own text data. The expected end-result is an emotional classifier with near human accuracy.