Role of HashTag in Finding User’s Online Personality

Problem

This article is about how we can use HashTag in finding user’s online personality. We are considering twitter as source of Hashtags, however we can use some other micro blogs also. So, we are interested in understanding the role of hashtag in deciding user’s personality using twitter. For better illustration of problem, I have scraped around 4800 tweets having hashtag ​ “#modi” from twitter [1]. And we will be using that tweets to understand ​ Modi’s ​ (India Prime Minister) Personality.

Assumption

The extensive use of hashtags makes Twitter more expressive and welcomed by people. Assumption of this article is, there should be large no of tweets mentioning “#modi” as a hashtag and it must co-occur with other hashtags which is usually trend on twitter. In other words, user should be very famous and widely present across Twitter.

Terminology

Lets understand some following terms to avoid any confusion.

HashTag — Prefixing a word or a phrase with a hash symbol, such as “#hashtag”. It adds context and metadata to tweets. Hashtag can be categorized in to majorly three types [2]:

  • Topic hashtag — Hashtag contains topic only like “Minimum Support Prices for #farmers is far from minimum”. Topic Hashtag is “#farmer”.
  • Sentiment hashtag — Hashtag are composed of sentiment words only like “i #love india”, “#shame on rapist”. “#love”,”#shame” are sentiment hashtag.
  • Sentiment-Topic hashtag — Those in which the topical word and the sentiment words appear together without separating blanks. For example, “#ILoveModi” — [“I”,”Love”,”Modi”], “Love” is sentiment and “Modi” is topic.

Online Personality** — It is defined by

  • What are topics related to that user?
  • What are opinions of public for that user?

** Not a standard definition. It can be extended.

Approach

Here, we will take T = “#modi” as key hashtag for which we want to find personality.

Let explore approach step by step

  1. Scrap tweets using T as query from twitter.
  2. Convert all tweets to lower case.
  3. Remove Noised hashtags i.e. Non-Subjective hashtags (Neutral hashtag) like “#hi”,”hello”.
  4. Calculate the co-occurrence of T with other hashtags i.e. get the frequency of T with other hashtags.
  5. Using Networkx (python library) and frequency matrix from step-4, visualize the graph.Width of edge tells about the frequency of co-occurring hashtags. Wider the edge, higher is the frequency of hashtags co-occurring together.
  6. Use word segmentation to get the literal meaning of hashtag like “#FarmerDeath” will be split into [“Farmer”,”Death”].
  7. Classify the type of hashtag using lexicons. For more detail, please refer paper on identifying hashtag type [3].
  8. After classification, we get:

a) Topics — tells modi is closely connected to following:

  • karnataka Elections 2018
  • BJP
  • Farmers
  • Mann ki Baat
  • Budget 2018
  • PM
  • Modi for 2019

b) Opinions — tells subjective opinion which is somehow related to modi

  • Power, Leadership, Shame
  • #GSTgood — [“GST”,”good”]
  • #ILoveModi — [“I”,”Love”,”Modi”]
from __future__ import division
import networkx as nx
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
fig = plt.figure()
target = "#modi"# (#HashTag,Frequency with target hashtag #modi)
topics_modi = [('#bjp', 799),('#farmers', 42),('#karnatakaelections2018', 91),('#modifor2019', 48),('#budget2018', 40),('#mannkibaat', 23),('#pm', 89)]
sentiment_topics_modi = [('#GSTgood',33),('#FarmerDeath',42),('#modirobsindia', 27), ('#ILoveModi',10)]
sentiments_modi = [('#leadership', 25), ('#power', 16), ('#shame', 16)]
G = nx.Graph()
size = list()
for topic in topics_modi:
G.add_edge(target,topic[0],color='r',weight=topic[1]/100)
for sentiment_topic in sentiment_topics_modi:
G.add_edge(target,sentiment_topic[0],\
color='b',weight=sentiment_topic[1]/100)for s_m in sentiments_modi:
G.add_edge(target,s_m[0],color='g',weight=s_m[1]/100)
pos = nx.circular_layout(G)
edges = G.edges()
weights = [G[u][v]['weight'] for u,v in edges]
node_color = ['r', 'r', 'skyblue', 'r', 'cyan', 'orange', 'r', 'r', 'r', 'skyblue', 'skyblue', 'skyblue', 'orange', 'orange','r']
# Making dynamic size of node by multiplying length of hashtag with random no 310
size_node = list(map(lambda x:len(x)*310,G.nodes()))
nx.draw(G, edges=edges, node_color = node_color, width=weights,with_labels=True,node_size=size_node)
# Legend designing - defining color’s heading
red_patch = mpatches.Patch(color='r', label='Topic')
blue_patch = mpatches.Patch(color='skyblue', label='Sentiment Topic')
orange_patch = mpatches.Patch(color='orange', label='Opinion/Sentiment')
cyan_patch = mpatches.Patch(color='cyan', label='key Hashtag')
plt.legend(handles=[red_patch,blue_patch,orange_patch,cyan_patch])
plt.show()

Limitation

Difficult to distinguish two different user based on hashtag only, two famous personality can share part of name like “#modi” can be for “Narendra Modi” or “Nirav Modi”.

Conclusion

Increasing use of hashtags on Twitter makes tweet more expressive and open up many doors to understand tweet in better manner. Here, we discussed about using hashtags to find user’s personality by finding topics related to that user and subjective opinion on that user.

If you want to watch video of same, please refer [4].

References

  1. https://github.com/Jefferson-Henrique/GetOldTweets-python
  2. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462.3827&rep=rep1&type=pdf
  3. http://www.aclweb.org/anthology/W15-2924
  4. https://www.youtube.com/watch?v=Bm8a06P7LOg

Originally published at medium.com on February 20, 2019.

--

--

--

Senior Data Scientist @ Fractal Analytics

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Transition from Chemical Engineer to Data Science Enthusiast

Datatang Data-supported Haomo AI Won Multiple Awards in WIDC 2021

Correlation in XGboost

LOTI: Weeknote 44

Fixing the plumbing on data collaboration

Tidying the Australian Same Sex Marriage Postal Survey Data with R

Pandemic Response Pipeline Using Machine Learning

Getting Started on Building Your Data Fabric

What is Linear Regression

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aakash Goel

Aakash Goel

Senior Data Scientist @ Fractal Analytics

More from Medium

Drug Recommendation System

Future Technologies Need Natural Language Processing For Five Powerful Reasons

Six compelling arguments as to why natural language processing (NLP) is essential for the development of Future Technologies

Hey Everyone, Let’s learn Thirukkural through a virtual assistant ..!

Natural Language Processing Challenges