A Complete Guide to Sentiment Analysis

Akshat Biyani, CareerFoundry Contributor

“That movie was a colossal disaster… I absolutely hated it! Waste of time and money #skipit”

“Have you seen the new season of XYZ? It is so good!”

“You should really check out this new app, it’s awesome! And it makes your life so convenient.”

By reading these comments, can you figure out what the emotions behind them are? They may seem obvious to you because we, as humans, are capable of discerning the complex emotional sentiments behind the text. Not only have we been educated to understand the meanings, connotations, intentions, and grammar behind each of these particular sentences, but we’ve also personally felt many of these emotions before and, from our own experiences, can conjure up the deeper meaning behind these words. Moreover, we’re also extremely familiar with the real-world objects that the text is referring to.

This doesn’t apply to machines, but they do have other ways of determining positive and negative sentiments! How do they do this, exactly? By using sentiment analysis.

In this article, we will discuss how a computer can decipher emotions by using sentiment analysis methods, and what the implications of this can be. If you want to skip ahead to a certain section, simply use the clickable menu:

  1. What is sentiment analysis?
  2. How does sentiment analysis work?
  3. Sentiment analysis use cases
  4. Machine learning and sentiment analysis
  5. Advantages of sentiment analysis
  6. Disadvantages of sentiment analysis
  7. Key takeaways and next steps

1. What is sentiment analysis?

With computers getting smarter and smarter, surely they’re able to decipher and discern between the wide range of different human emotions, right? Wrong—while they are intelligent machines, computers can neither see nor feel any emotions, with the only input they receive being in the form of zeros and ones—or what’s more commonly known as binary code.

However, on the other hand, computers excel at the one thing that humans struggle with: processing large amounts of data quickly and effectively. So, theoretically, if we could teach machines how to identify the sentiments behind the plain text, we could analyze and evaluate the emotional response to a certain product by analyzing hundreds of thousands of reviews or tweets. This would, in turn, provide companies with invaluable feedback and help them tailor their next product to better suit the market’s needs. So, what kind of process is this? Sentiment analysis!

Sentiment analysis, also known as opinion mining, is the process of determining the emotions behind a piece of text. Sentiment analysis aims to categorize the given text as positive, negative, or neutral. Furthermore, it then identifies and quantifies subjective information about those texts with the help of natural language processing, text analysis, computational linguistics, and machine learning.

2. How does sentiment analysis work?

There are two main methods for sentiment analysis: machine learning and lexicon-based. The machine learning method leverages human-labeled data to train the text classifier, making it a supervised learning method. The lexicon-based approach breaks down a sentence into words and scores each word’s semantic orientation based on a dictionary. It then adds up the various scores to arrive at a conclusion.

In this example, we will look at how sentiment analysis works using a simple lexicon-based approach. We’ll take the following comment as our test data:

“That movie was a colossal disaster… I absolutely hated it! Waste of time and money #skipit”

Step 1: Cleaning

The initial step is to remove special characters and numbers from the text. In our example, we’ll remove the exclamation marks and commas from the comment above.

That movie was a colossal disaster I absolutely hated it Waste of time and money skipit

Step 2: Tokenization

Tokenization is the process of breaking down a text into smaller chunks called tokens, which are either individual words or short sentences. Breaking down a paragraph into sentences is known as sentence tokenization, and breaking down a sentence into words is known as word tokenization.

[ ‘That’, ‘movie’, ‘was’, ‘a’, ‘colossal’, ‘disaster’, ‘I’, ‘absolutely’, ‘hated’, ‘it’,  ‘Waste’, ‘of’, ‘time’, ‘and’, ‘money’, ‘skipit’ ]

Step 3: Part-of-speech (POS) tagging

Part-of-speech tagging is the process of tagging each word with its grammatical group, categorizing it as either a noun, pronoun, adjective, or adverb—depending on its context. This transforms each token into a tuple of the form (word, tag). POS tagging is used to preserve the context of a word.

[ (‘That’, ‘DT’), 

  (‘movie’, ‘NN’), 

  (‘was’, ‘VBD’),  

  (‘a’, ‘DT’) 

  (‘colossal’, ‘JJ’), 

  (‘disaster’, ‘NN’),  

  (‘I’, ‘PRP’), 

  (‘absolutely’, ‘RB’), 

  (‘hated’, ‘VBD’), 

  (‘it’, ‘PRP’),  

  (‘Waste’, ‘NN’) , 

  (‘of’, ‘IN’), 

  (‘time’, ‘NN’), 

  (‘and’, ‘CC’),

  (‘money’, ‘NN’),  

  (‘skipit’, ‘NN’) ]

Step 4: Removing stop words

Stop words are words like ‘have,’ ‘but,’ ‘we,’ ‘he,’ ‘into,’ ‘just,’ and so on. These words carry information of little value, andare generally considered noise, so they are removed from the data.

[ ‘movie’, ‘colossal’, ‘disaster’, ‘absolutely’, ‘hated’, Waste’, ‘time’, ‘money’, ‘skipit’ ]

Step 5: Stemming

Stemming is a process of linguistic normalization which removes the suffix of each of these words and reduces them to their base word. For example, loved is reduced to love, wasted is reduced to waste. Here, hated is reduced to hate.

[ ‘movie’, ‘colossal’, ‘disaster’, ‘absolutely’, ‘hate’, ‘Waste’, ‘time’, ‘money’, ‘skipit’ ]

Step 6: Final Analysis

In a lexicon-based approach, the remaining words are compared against the sentiment libraries, and the scores obtained for each token are added or averaged. Sentiment libraries are a list of predefined words and phrases which are manually scored by humans. For example, ‘worst’ is scored -3, and ‘amazing’ is scored +3. 

With a basic dictionary, our example comment will be turned into:

movie= 0, colossal= 0, disaster= -2,  absolutely=0, hate=-2, waste= -1, time= 0, money= 0, skipit= 0

This makes the overall score of the comment -5, classifying the comment as negative.

3. Sentiment analysis use cases

Sentiment analysis is used to swiftly glean insights from enormous amounts of text data, with its applications ranging from politics, finance, retail, hospitality, and healthcare. For instance, consider its usefulness in the following scenarios:

  • Brand reputation management:  Sentiment analysis allows you to track all the online chatter about your brand and spot potential PR disasters before they become major concerns. 
  • Voice of the customer: The “voice of the customer” refers to the feedback and opinions you get from your clients all over the world. You can improve your product and meet your clients’ needs with the help of this feedback and sentiment analysis.
  • Voice of the employee:  Employee satisfaction can be measured for your company by analyzing reviews on sites like Glassdoor, allowing you to determine how to improve the work environment you have created.
  • Market research: You can analyze and monitor internet reviews of your products and those of your competitors to see how the public differentiates between them, helping you glean indispensable feedback and refine your products and marketing strategies accordingly. Furthermore, sentiment analysis in market research can also anticipate future trends and thus have a first-mover advantage.

Other applications for sentiment analysis could include:

  • Customer support
  • Social media monitoring
  • Voice assistants & chatbots
  • Election polls
  • Customer experience about a product
  • Stock market sentiment and market movement
  • Analyzing movie reviews

4. Machine learning and sentiment analysis

Sentiment analysis tasks are typically treated as classification problems in the machine learning approach. Data analysts use historical textual data—which is manually labeled as positive, negative, or neutral—as the training set. They then complete feature extraction on this labeled dataset, using this initial data to train the model to recognize the relevant patterns. Next, they can accurately predict the sentiment of a fresh piece of text using our trained model.

Naive Bayes, logistic regression, support vector machines, and neural networks are some of the classification algorithms commonly used in sentiment analysis tasks. The high accuracy of prediction is one of the key advantages of the machine learning approach.

5. Advantages of sentiment analysis

Considering large amounts of data on the internet are entirely unstructured, data analysts need a way to evaluate this data. With regards to sentiment analysis, data analysts want to extract and identify emotions, attitudes, and opinions from our sample sets. Reading and assigning a rating to a large number of reviews, tweets, and comments is not an easy task, but with the help of sentiment analysis, this can be accomplished quickly. Another unparalleled feature of sentiment analysis is its ability to quickly analyze data such as new product launches or new policy proposals in real time. Thus, sentiment analysis can be a cost-effective and efficient way to gauge and accordingly manage public opinion.

6. Disadvantages of sentiment analysis

Sentiment analysis, as fascinating as it is, is not without its flaws. Human language is nuanced and often far from straightforward. Machines might struggle to identify the emotions behind an individual piece of text despite their extensive grasp of past data. Some situations where sentiment analysis might fail are:

  • Sarcasm, jokes, irony. These things generally don’t follow a fixed set of rules, so they might not be correctly classified by sentiment analytics systems.
  • Nuance. Words can have multiple meanings and connotations, which are entirely subject to the context they occur in.
  • Multipolarity. When the given text is positive in some parts and negative in others.
  • Negation detection. It can be challenging for the machine because the function and the scope of the word ‘not’ in a sentence is not definite; moreover, suffixes and prefixes such as ‘non-,’ ‘dis-,’ ‘-less’ etc. can change the meaning of a text.

7. Key takeaways and next steps

In this article, we examined the science and nuances of sentiment analysis. While sentimental analysis is a method that’s nowhere near perfect, as more data is generated and fed into machines, they will continue to get smarter and improve the accuracy with which they process that data. 

All in all, sentimental analysis has a large use case and is an indispensable tool for companies that hope to leverage the power of data to make optimal decisions.

For those who believe in the power of data science and want to learn more, we recommend taking this free, 5-day introductory course in data analytics. You could also read more about related topics by reading any of the following articles:

What You Should Do Now

  1. Get a hands-on introduction to data analytics with a free, 5-day data analytics short course.

  2. Take part in one of our live online data analytics events with industry experts.

  3. Talk to a program advisor to discuss career change and find out if data analytics is right for you.

  4. Discover how to become a qualified data analyst in just 4-7 months—complete with a job guarantee.