Search

The Structures of Informational Warfare: Bot Network Topologies

Written by Swetabh Changkakoti


"And how did that make you feel?"


This infamous question is, undoubtedly, the go-to for parodying psychologists in pop-culture. It's low-effort, implicitly demeaning, and kind of crude when taken out of context, but it's still the hallmark of Rogerian psychotherapy. The reason? It seems to be genuinely effective. In 1964, Joseph Weizenbaum, Rogerian skeptic and one of the fathers of modern AI, set about developing ELIZA [1], a program that reflected the user's statements back as questions a psychologist at the time would ask. In doing so, he hoped to highlight the superficiality of human-machine interactions (and make fun of an all-too-common trope). The actual results of his experiment, however, were surprising, to say the least. Several users, including his own secretary, attributed human-like feelings to the program, some even going as far as being convinced of its intelligence! While ELIZA cemented itself in history for several reasons, a particularly crucial one was it showing that while a bot might not pass an elaborate Turing test, it might still bypass humans interacting with it superficially.


What makes ELIZA ever-more relevant today is the industrial cornucopia of surface-level humanity we find in social media. With more than 500 million Tweets every day, it's fair to expect that most interactions on Twitter are limited to short, bite-sized declarations or endorsements (via likes and retweets). Borges's The Library of Babel poses the central question of how we filter valuable information from torrents of questionable data. Today, the same dilemma stands: what– and who– is real in our gargantuan social networks?


The Power in Numbers


Acquiring a Twitter bot is easy. For smaller-scale operations, all you need is a developer account, which doesn't require a lot of effort. Larger networks with more malicious operations, on the other hand, often need to bypass Twitter's terms and conditions for developers. Unfortunately, this isn't hard either. There are massive online marketplaces for buying and reselling deactivated Twitter accounts, which, naturally, makes the search for bots harder too. Overall, a 2017 study showed that probably ~15% of all Twitter accounts are bots [2], and since then, the number has only gone up.


While the bots themselves aren't particularly intelligent, the fact that companies had lost 12 billion dollars in 5 years due to phishing and Business Email Compromises proves that there's a certain power in numbers [3]. Here are a few facts illustrating just how impactful Twitter bot networks can be:

  • Two-thirds of tweeted links to popular websites are posted by automated accounts– not human beings [4]

  • Nearly half of the accounts tweeting about the coronavirus and about reopening the USA are bots [5]

  • During the 2016 elections, 1/3rd of pro-Trump tweets and 1/5th of pro-Hillary tweets came from bots.

  • In the wake of the Citizenship Amendment Act discussions in India, a single Twitter user found 160,000 bots created to reinforce certain political parties in the country.

The prevalence and influence of Twitter bots necessitate an appropriate response, the ability, and action of identifying and removing malicious bots from the platform. However, as ELIZA showed so effectively, this isn't a simple task.


Blade Runner 2020: Identifying the Replicants


The most important point of discussion when it comes to identifying bots lies in their origin. By nature, bots are automated accounts, the programs for which are written by humans. This could mean two things– (1) low-effort bots are pretty clear in what they are, and (2) publicly available bot detection methods are generally prone to hard-coded workarounds from bot creators. The former point often informs design decisions that are overcome by the latter.


What do simpler Twitter bots generally look like? On an individual level, people tend to identify bots by analyzing specific features of their accounts. For example, bots are more likely to tweet or retweet at higher frequencies than normal humans, they usually have stock, non-human, or stolen profile pictures, their tweets are prone to typos, they tend to reuse certain phrases, or spam certain hashtags, among other things.


A typical bot account on Twitter. Notice the numerical jumble in the username and lack of profile picture.


While the example here is clearly identifiable as a bot by the features I've listed, there's a problem in using merely these features for identification. The people over at Twitter put it best:


As mentioned, an account with a strange handle is often someone who was automatically recommended that username because their real-name was taken at sign-up. An account with no photo or location maybe someone who has personal feelings on online privacy or whose use of Twitter may expose them to risk, as an activist or dissident. Don’t like to add much of a bio or your location to your account? Some of us at Twitter don’t either. Even if all of these public details are put into a machine learning model to try to probabilistically predict if an account is a bot, when they rely on human analysis of public account information, that process contains biases — from the start. [11]


Programmers aware of these biases can use them as workarounds to create Twitter bots that are harder to detect, which causes problems for both the platform and the real people using the platform by manipulating the apparent public opinion.


Future-Proofing Identification: Motivating Network Topologies


As a species, we humans have definitely changed over the years. We've experienced changes in how we behave, how we speak, and even, in some ways, what we need to thrive. Over our millennia of evolution, though, there's one thing that hasn't fundamentally changed– how we interact. Social network theory has been a cornerstone of anthropological research since the 1970s [6], and ever since computing has innovated and scaled up, more researchers, like Alvin Wolfe, have come to utilize social network models in investigating large-scale relationships. Patterns of human interactions vary with scale and context, but they're still consistent for a given situation, and these interaction patterns aren't easy to code into bots.


What makes interaction patterns hard to emulate?


Psychology. Thanks, ELIZA.


Several human behavioral tendencies weave together to facilitate the creation of complex social structures. People don't form social relations at random; rather, the establishment of connections has local biases, like the fact that people with more relations are more probable to receive more relations (this is called preferential attachment or the Matthew effect). At first glance, it seems easy to circumvent this by programming a bot to prefer people with more followers or connections, but that doesn't change how the humans on the other end interact! Of course, the complexity of this topic is beyond the scope of one article, but here's a small example:


Assume that a bot follows a person on Twitter. How does the person react?

  1. The person ignores it: This is the more likely option, assuming it's possible for a human to tell an account is a bot by examining its account and looking at mutual connections.

  2. The person follows the bot back: This is not as unlikely as it seems at first glance. Ghosh et al. [7] highlighted the effect of 'social capitalists', i.e. people who tend to follow accounts back without further inspection. However, this would show up clearly, upon further inspection, as an exceedingly dense, large network associated with the bot.

This motivates the use of a given account's network of connections, i.e. its network topology, as a feature in identifying whether or not it is a bot.


Exploring Network Topologies


The most common method of exploring the network topology centered at a given user (called the ego of that exploration) is crawling, i.e. examining all of a user's connections and mutual connections to a given degree of separation. A crawl has two defining parameters– its direction (friends or followers) and the number of steps (denoted by K). The ratio of the number of friends to followers is often a useful indicator of the general behavioral pattern of a user, and hence, is of interest to researchers. A ratio of ~1 would describe the user as a "social capitalist", as defined earlier.


Most crawls are restricted to K = 1 or 2, which might seem small on paper, but start looking much larger when you consider the 'small world phenomenon'. As determined by social psychologist Stanley Milgram in 1967, this states that any two people in the world are six, or fewer, social connections away from each other (check out this Veritasium video for a great explanation). A crawl of K = 3 is impractical on Twitter as this could theoretically crawl the entire network with a median distance of 4.12 [8, p. 594]. Cornelissen et al [9] found that a K-2 crawl on a mixed dataset of bots and humans (the Varol dataset [2]), when subjected to the appropriate unsupervised clustering algorithms (the AGNES algorithm paired with the Spearman distance measure [CITE]), managed to identified bots with a 70% accuracy. By proving that merely the social network topology of a user is sufficient in determining whether or not the user's a bot, they validated our intuition of network topology is an effective measure.


Of course, 70% accuracy probably isn't good enough for more large-scale or commercial exploits, but it suggests topology is an extremely useful feature, which when paired with other content-based features can lead to much better predictions.


Information Propagation and Network Structures


We've seen how bot networks can have distinct network topologies, and how difficult it is to emulate human networks on their own. Emulation, however, isn't always the goal. Accounts that are conspicuously automated exist all over Twitter– some benign, others not so much– and how these bots behave varies according to their purpose. A bot meant to comment suicide hotline numbers on posts indicating suicidal intent, for example, is likely to behave simply, mainly replying to posts and not engaging with other bots. Contrary to this, when the primary goal is to spread an opinion or spark controversy, bots need to build significant followings of their own, and their behavior is likely to include more interaction with other accounts.


In late 2019, India was rife with protests in the wake of the extremely polarising Citizenship Amendment Bill. As with any movement in today's age, both government officials and protestors took to Twitter to garner support for themselves. However, seeing the fact that the former could get pretty much anything trending was cause for suspicion, and looking deeper into its support revealed entire swarms of Twitter bots contributing to these trends. Platform manipulation, though against Twitter's policies, was tangible in the Indian political sphere, and Reddit user u/onosmosis conducted a data analysis that revealed the same, for both the majority and opposition [10].


A figure depicting the network associated with Congress's agenda.


A figure depicting the structures of networks associated with BJP's agenda.


The pictures above seem vastly different– and they are– but the fundamental organizing principle behind either network is the same: a group of seeds, or prime sources of trends or information, feeding multiple amplifying groups and accounts that retweet and proliferate this information.


The main differences in the topological structures of the two networks are highlighted below:

What's most interesting about this analysis is how the clear difference in both network topologies correlates with both the intentions and effects of their respective political origins. BJP's network was far more complex and productive than the INC's, which signified both technical prowess and (1) the unwillingness to pull back on the network in any case due to the absence of a single "off switch", given its non-linearity, and (2) a different attitude towards abuse and platform manipulation, both of which it was far more successful (and intentional) at causing. This example clearly depicts the integral role of graph topology in the effectivity of a bot network too.


Conclusion


By nature of its bite-sized information structure, Twitter is thoroughly representative of a general social media dynamic. Trends, memes, and ideas from Twitter spread like wildfire, and in my opinion, so does its composition. Accordingly, the issue of prevalent bots isn't unique to Twitter, but the notions of a profile picture, bio, or retweet might be. Looking towards the future, we need to be vigilant of the how we identify bots, and the one feature from this example of Twitter bots we can abstract out and apply to almost any social network out there is, by nature, that of interactions, and consequently, that of network topologies. Going into the future, it's therefore imperative to be conscious of network topology both as an identifying feature for bots and as a marker of their method of proliferation. In the end, it's eerily comforting to say that our interactions, our relationships to each other– for better or for worse– are what define us, even in a sphere that we usually consider devoid of intimacy.

References

  1. Weizenbaum, J. (1966). ELIZA---a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. https://doi.org/10.1145/365153.365168

  2. Varol, O., Ferrara, E., Davis, C., Menczer, F., Flammini, A. (2017). Online human-bot interactions: Detection, estimation, and characterization. In the Eleventh international AAAI conference on web and social media.

  3. Federal Bureau of Information. (2018, July 12). Business Email Compromise: The 12 Billion Dollar Scam [Press release]. Retrieved from https://www.ic3.gov/media/2018/180712.aspx

  4. Wojcik, S., Messing, S., Smith, A., Rainie, L., Hitlin, P. (2020, May 30). Twitter Bots: An Analysis of the Links Automated Accounts Share. Retrieved from https://www.pewresearch.org/internet/2018/04/09/bots-in-the-twittersphere/

  5. Young, V. (2020, May 27). Nearly Half of the Twitter Accounts Discussing 'Reopening America' May Be Bots - News - Carnegie Mellon University. Retrieved from https://www.cmu.edu/news/stories/archives/2020/may/twitter-bot-campaign.html

  6. Wolfe, A. W. (1978). The rise of network thinking in anthropology. Social Networks, 1(1), 53–64. https://doi.org/10.1016/0378-8733(78)90012-6

  7. Ghosh, S., Viswanath, B., Kooti, F., Sharma, N., Korlam, G., Benevenuto, F., Ganguly, N., Gummadi, K. 2012. Understanding and combating link farming in the twitter social network. In Proceedings of the 21st international conference on World Wide Web - WWW ’12. Lyon,France, 61–70. https://doi.org/10.1145/2187836.2187846

  8. Kwak H., Lee C., Park H., Moon S. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th international conference on World Wide Web (IW3C2). Raleigh, North Carolina, USA., 591–600. https://doi.org/10.1145/1772690.1772751

  9. Cornelissen, L. A., Barnett, R. J., Schoonwinkel, P., Eichstadt, B. D., & Magodla, H. B. (2018). A network topology approach to bot classification. https://doi.org/10.1145/3278681.3278692

  10. u/onosmosis. (2019). Uncovering the Nexus of Congress & BJP IT Cell. Retrieved from https://www.notion.so/UrbanNazi-com-Uncovering-the-Nexus-of-Congress-BJP-IT-Cell-12b271c6b8a6432f8a767cd6ff2ae9a6

  11. Bot or not? The facts about platform manipulation on Twitter

1 comment

TECHVIK

Copyright © 2019 by Techvik.