I remember coming across a game called ‘Find Satoshi‘ somewhere back in 2008, a Six Degrees of Separation puzzle. Satoshi, here (not to be confused with Satoshi Nakamoto, the creator of Bitcoin) is someone who knows you are on the hunt for finding him but is not cooperating in any way with the hunt.
Six Degrees of Separation – The Theory, Basics
Not sure of the 1993 released drama/mystery, the idea of Six Degrees of Separation says that all living things in the world are six or fewer steps away from each other. Small world, isn’t it?! Wikipedia explains it as a chain of “a friend of a friend” statements which can be made to connect any two people in a maximum of six steps.

Mind Candy – the creators of game Perplex City intended to test the theory of Six Degrees of Separation based on this puzzle, to check if it is possible to track down someone (one-way) given his photograph and first name. Well is it? Indeed it is! Check out this blog for more information on the results.
Over the time, things have changed. While looking at the networks of people – has no doubt become denser and with increasing population (7.6 billion as of 2018, as said by Worldometers) I am sure the mean value of hops required for a person to reach out to any other has certainly dropped (thanks to Internet and technology).
With my growing interest in Data Science and beyond, I tried giving Network Science another mix to help me gain a better understanding of this Six Degrees of Separation theory. While this article focuses more on the concept and data playground, I have included visualizations from the Network Science project that should help you get a better (and colorful) picture of the entire story.
Dataset: Twitter Everything, Twitter Hashtags
Ever since I read about Open Source Intelligence (article coming up soon) I have developed an eye to read things in the form of Data. Nope, not 1s and 0s, but Information (pure information!). Twitter has always been the best source for quick analysis on real-time or context related data/topic. The visuals you see are all based on the #GamerGate dataset dated November 2017. Particularly, I chose to stick with this old topic – consider as a dying one to check it’s distance from the current trending topics purely by means of Hashtags.
The Network: Hashtags
Yes, hashtag-to-hashtag network – a simple undirected network of all hashtags associated with #GamerGate. While extracting the hashtags is not much of a struggle, I will post an article soon on it in the coming days (including the code and step-by-step approach). Looking at the nodes, it is obvious for #GamerGate to have the highest degree of 292 out of the 847 edges, I chose to go ahead extracting dataset for #FlashbackFriday, a node with degree 2.
Network Visualizations and Conclusions
Here are the properties of the #FridayFeeling – other hashtags network:

While in this hashtag rich network, #GamerGate comprised approximately 5% of the entire network. While the numbers are not very solid, the idea of Six Degrees of Separation was well conveyed and further used to identify the reach (or call it a spread) of a network, based on any topic, and then identifying the likelihood of a person to be involved in a discussion purely of not 1st choice but more of an accidental encounter. The hypothesis well stated how users are more likely to get introduced to topics which are not of much primary interest, but sure can become one. Thus, a topic like #GamerGate, being at the end of its presence can still be revoked, with high uncertainty of its life over such scale-free network.

Additionally, in the #FlashbackFriday dataset, #FridayFeeling was the strongest node with highest degree centrality, and we see a clear visualization of #FridayFeeling – #GamerGate as follows:

#FridayFeeling was found to be the common node between #FlashbackFriday and #GamerGate to have the highest degree. Apart from this, there are 7388 strongly connected nodes which make users very vulnerable to gamergate using the other hashtags in the network. The overall degree of #GamerGate is less than of #FridayFeeling but the betweenness centrality is higher for #GamerGate. Therefore, even when it is not being used as frequently but a user is most likely to come across #GamerGate if he uses any of the other co-listed hashtags in the dataset.
Work In Progress
While the above research is limited to one topic, wider analysis of varied topics is required to identify the mean-everything measures of hypothesis as discussed above. Ongoing work includes calculating average hops required for a user over the social web, given a topic of interest and the target topic. The idea can be used to both understand and introduce (read promote) a user to join a pool of discussion that can be used to promote anti-bad activities over the social web. The likelihood of a user in becoming part of any chosen target discussion can be further used to understand the mindset (read nature) performing a detailed analysis, on users and like ones forming clusters. The entire pattern can be used to identify the rate and direction of the flow of choices people make over the web.