Sentiment analysis using WhatsApp emojis
In 2015, 😂 became Oxford Dictionary’s “Word” of the Year. Emojis these days have become so popular that they are not only an essential part of our communicating language on social media but also these emojis capture the inherent sentiments as well as the emotion of the user. This study uses the WhatsApp emojis to analyze the sentiment of its user with image processing and sentiment analysis tools in python.
I started this as a summer project during the quarantine period and now the source code along with a downloadable version of a stable release (EmojiScore v0.1.0) is available on GitHub.
https://github.com/Subhasishbasak/emoji_analysis
with the current version you can check your sentiment score and generate some nice performance graphs using just screenshot(s) of your recent used WhatsApp emoji box 😄 !
Methodology:
The work is mainly divided into 2 stages.
- Processing the screenshots using computer vision tools
- Computing the consolidated sentiment score
Further the raw screenshots are difficult to work with and hence they need some pre-processing and cleaning. I have used the OpenCV package in python for my purpose. The task was to crop down the screenshots to the emoji boxes first and then extract the emojis out of it.
Image processing : For identifying a suitable crop to extract the emoji box (as one shown in the figure), 3 separate intensity plots respectively for RGB, were used for each of the screenshots. The same approach of isolating objects using intensity changes helped in identifying the number of emojis as well. To minimize the number of false positives I used a 100x100 sized grid of intensity lines in order to approximate the correct number of emojis in the box.
The next task was to identify the emojis; which was done with cv2.matchTemplate , which implements the Template matching algorithm with method CV_TM_SQDIFF_NORMED. (cv2 provides a bunch of choices of methods to compute the intensity difference b/w the image and the template. I found this particular one the most accurate). The model worked well with less than 20% incorrect predictions. The main drawback of this model was the method was not compatible with non affine transformations of the images (in this case the cropped emojis suffered a little shear and stretch/compression sometimes due to resizing). Another suggested approach for this was Classification with deep Convolutional Neural Nets (CNN), but the main pitfall was the unsupervised nature and small size of the training dataset. Though I am sure there is some other way out 😣 (still working on it).
Computing the Sentiment Score : Next comes the Sentiment analysis part where the emojis are assigned a sentiment score vector consisting of a negative, neutral and a positive score (all normalized to 1). According to the study (see references) the sentiment of the emojis are computed from the sentiment of the tweets in which they occur. It engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity. I took the consolidated sentiment score for a particular screenshot as a weighted average of the sentiment scores of the emojis, where the weights are from an Exponential distribution with scale somewhere between 0.5–2. The choice of the distribution emphasizes the relative weightage of the recently used emojis and the scale hyper-parameter is tuned accordingly to control the variance. The default weights are set to be Uniformly distributed though.
Platform & dependencies:
I worked on Python 3.6.9 and apart from some essential imports like (os, sys and inspect) I mainly made use of the following packages:
Numpy, Pandas, Matplotlib, CV2, Pickle, Statistics, Scipy
What I learnt:
This summer project became a great opportunity for me to learn a lot of new things. I made use of the following tools/technologies to implement my approach,
python, git, GitHub, jupyter notebook, vim
- It was a great coding exercise.
- Learnt a bunch of useful git hacks.
- Got a better working knowledge of python modules/packages and imports.
- Implemented many of the tools/concepts from the course CGE 136 : Computer Vision which I took during Spring 2020 at CMI.
- Got introduced to building python web APIs on a dynamic webpage using Flask, Django.(still a long way to go with this)
- Moreover this project let me track the mood of some of my friends who participated in the study (it was fun 😛).
Future work & references:
- Till now there are 3 launcher functions to draw inference, but there are numerous things that can be done with emojis. Feel free to fork the repository and continue this project. Also I have listed several issues (on the GitHub repository) which can be good starting point for potential future developments.
- The code works only for android emojis for now and that also for the dark theme (unfortunately yes !!). But it can be simply extended for Apple emojis and light theme as well following the same approach. I leave this as a coding exercise to the reader (simply because I am too lazy during this quarantine 😜)
- I used the sentiment score for the emojis from this paper. It will be a great exercise to obtain the sentiment scores using a different approach. I will highly appreciate any suggestions on this.
- Lastly I would like to thank my friends who participated in the study and our course instructor Prof. Rukmini Vijaykumar for her guidance & support.
- Please don’t hesitate if you have any comments/suggestions 😃 . You can share your thoughts/views either by creating a pull request on the GitHub repository or contact me (homepage).