Role of HashTag in Finding User’s Online Personality


This article is about how we can use HashTag in finding user’s online personality. We are considering twitter as source of Hashtags, however we can use some other micro blogs also. So, we are interested in understanding the role of hashtag in deciding user’s personality using twitter. For better illustration of problem, I have scraped around 4800 tweets having hashtag ​ “#modi” from twitter [1]. And we will be using that tweets to understand ​ Modi’s ​ (India Prime Minister) Personality.


The extensive use of hashtags makes Twitter more expressive and welcomed by people. Assumption of this article is, there should be large no of tweets mentioning “#modi” as a hashtag and it must co-occur with other hashtags which is usually trend on twitter. In other words, user should be very famous and widely present across Twitter.


Lets understand some following terms to avoid any confusion.

HashTag — Prefixing a word or a phrase with a hash symbol, such as “#hashtag”. It adds context and metadata to tweets. Hashtag can be categorized in to majorly three types [2]:

  • Topic hashtag — Hashtag contains topic only like “Minimum Support Prices for #farmers is far from minimum”. Topic Hashtag is “#farmer”.
  • Sentiment hashtag — Hashtag are composed of sentiment words only like “i #love india”, “#shame on rapist”. “#love”,”#shame” are sentiment hashtag.
  • Sentiment-Topic hashtag — Those in which the topical word and the sentiment words appear together without separating blanks. For example, “#ILoveModi” — [“I”,”Love”,”Modi”], “Love” is sentiment and “Modi” is topic.

Online Personality** — It is defined by

  • What are topics related to that user?
  • What are opinions of public for that user?

** Not a standard definition. It can be extended.


Here, we will take T = “#modi” as key hashtag for which we want to find personality.

Let explore approach step by step

  1. Scrap tweets using T as query from twitter.
  2. Convert all tweets to lower case.
  3. Remove Noised hashtags i.e. Non-Subjective hashtags (Neutral hashtag) like “#hi”,”hello”.
  4. Calculate the co-occurrence of T with other hashtags i.e. get the frequency of T with other hashtags.
  5. Using Networkx (python library) and frequency matrix from step-4, visualize the graph.Width of edge tells about the frequency of co-occurring hashtags. Wider the edge, higher is the frequency of hashtags co-occurring together.
  6. Use word segmentation to get the literal meaning of hashtag like “#FarmerDeath” will be split into [“Farmer”,”Death”].
  7. Classify the type of hashtag using lexicons. For more detail, please refer paper on identifying hashtag type [3].
  8. After classification, we get:

a) Topics — tells modi is closely connected to following:

  • karnataka Elections 2018
  • BJP
  • Farmers
  • Mann ki Baat
  • Budget 2018
  • PM
  • Modi for 2019

b) Opinions — tells subjective opinion which is somehow related to modi

  • Power, Leadership, Shame
  • #GSTgood — [“GST”,”good”]
  • #ILoveModi — [“I”,”Love”,”Modi”]


Difficult to distinguish two different user based on hashtag only, two famous personality can share part of name like “#modi” can be for “Narendra Modi” or “Nirav Modi”.


Increasing use of hashtags on Twitter makes tweet more expressive and open up many doors to understand tweet in better manner. Here, we discussed about using hashtags to find user’s personality by finding topics related to that user and subjective opinion on that user.

If you want to watch video of same, please refer [4].



Originally published at on February 20, 2019.

Senior Data Scientist @ Fractal Analytics