Keywords

1 Introduction

Anxiety disorders constitute one of the leading mental health concerns in the United States. The disorder has an estimated prevalence of 1.6% to 5.0% in the general population [1]. Sufferers of this condition are known to experience non-specific persistent fear and worry, and become overly concerned with everyday matters [2]. This condition is known to incur high costs, including reduced productivity, diminished quality of life, and even heightened risk of suicide [3].

Anxiety disorders are closely associated with an individual’s interactions, manifested in the way an individual expresses themselves and interacts with others in their social environment. In fact, they are associated with reduced social engagement and reduced perceived quality of social relations [4]. Conversely, support from friends, family, and loved ones has been shown to buffer the effects of anxiety [5, 6]: the presence of social ties may enhance coping strategies and increase a individual’s sense of control over the situation. Researchers have employed theories in attempts to explain this role of social integration in the maintenance of improved mental health and reduced risk to conditions like anxiety disorders, including social causation, symbolic interactionist, social exchange, self-esteem, meta-cognitive, and stress-vulnerability theories [7].

However, little is explored empirically about the association of social network structure and the interactions of an individual with aspects of mental health functioning, such as anxiety. Empirical work examining the relationship of social interactions with anxiety has suffered from significant practical challenges [8]. A threat to validity in existing cross-sectional studies is the potential bias in the retrospective recall of social ties among anxiety-affected individuals. Moreover, while some studies have indeed found positive associations between the structural characteristics of social relations and the availability of instrumental and emotional support [9], how interpersonal dynamics are associated with anxiety experiences remains less understood. Additionally, prior work has largely recognized the value of strong social ties in mental wellbeing [8]; whether similar findings hold true for weak ties as well is relatively less known.

In recent years, individuals have begun to appropriate social media platforms like Twitter and online communities like Reddit to self-disclose about their mental illnesses [10], seek support [11], and derive therapeutic benefits [12]. Social media language in particular has also been established to be valuable in understanding and predicting different forms of mental illness like depression [13] and suicide [14].

Nevertheless, research so far has provided limited insights into the role that the social interactions and social networks on these online platforms play in characterizing an individual’s mental health experience, such as anxiety. This is especially a notable gap, given that positive benefits of these platforms have been examined with considerable interest in the past. For instance, social affordances of these platforms have been argued to augment social relationships and support mental health [14,15,16,17]. In a way, these social functions have been touted to augment the benefits of face-to-face interactions in mental health due to the reach, accessibility, ubiquity, and pervasiveness of the platforms [11, 18].

Considering this gap in the literature and given that continued negative social exchanges, presence of unhealthy and unsupportive social relations, and negativity in ties can exacerbate anxiety disorders [19], we seek to answer the following research question in this paper: Can online social network structure and interactions, and social behaviors signal an individual’s risk of anxiety?

To this end, we chose the social media platform, Twitter. Our study focuses on a sample of 200 Twitter users and their over 200 thousand posts shared on the platform, who were expert-validated to have self-disclosed about suffering from an anxiety disorder. On their data, we model, using state-of-the-art network science measures [20], a variety of attributes of their online social networks, interactions, and social behaviors using natural language and network analysis approaches. We find that several of these attributes, when incorporated in a supervised learning classifier, successfully help distinguish them from a control group. For instance, anxiety users demonstrate a strong tendency to engage with folks with whom they did not have any prior bidirectional interaction; they also connect to a diverse set of smaller, mutually disconnected sub-networks. Our findings provide novel empirical insights into the manner in which online social networks and interactions can be indicative of an individual’s mental health status, specifically anxiety. We discuss the implications of our work instrumenting online social platforms in ways that yield positive affordances and outcomes for individuals vulnerable to mental illnesses.

2 Data and Methods

2.1 Identifying Anxiety Users

We started by collecting a sample of Twitter users who had self- disclosed about their diagnosis of anxiety on Twitter through a public post. To identify anxiety diagnoses, inspired from prior work [10], we utilized a set of carefully curated search queries: “diagnosed [me]* with anxiety” and “i got/was/am/have been diagnosed with anxiety”.

We used a web-based Twitter crawler called GetOldTweetsAPI to obtain tweets with self-disclosures of anxiety between 2013 and 2017. Our initial search based on the queries listed above gave us 3856 tweets shared by 966 users. Thereafter, for each user, two human annotators familiar with social media content around mental health manually inspected the collected tweets to identify if they indicated a genuine self-disclosure of anxiety. Then we removed users who had less than 20 followers and followings as these networks were too small to compute meaningful social network metrics. We also removed accounts which had too few posts (less than 10) and with more than 10,000 followers or followings. Our final dataset at the conclusion of this data-gathering step contained 200 users. For each of these users, we collected their entire timeline data again using the Twitter API. Table 1 presents some descriptive statistics of this dataset.

Table 1. Descriptive statistics of acquired Twitter data.

2.2 Collecting Social Network and Interaction Data of Anxiety Users

Social Network Data

For each of the above 200 users we proceeded to collect their (Twitter) social network data. We define network data to consist of the list of other users who are following the anxiety users and the list of those who our anxiety users are following. Twitter allows unidirectional links i.e., if A follows B, B does not necessarily follow A. For our research question, we focus on bidirectional ties, which are better indicators of social connections than unidirectional ties. Therefore, for each anxiety user, we consider the intersection of the accounts that the user is following and the accounts that are following the user as their bidirectional ties. We refer to this set of users as “friends” of the user. We obtained 41,557 users who were the “friends” of our 200 anxiety users. For each of these friends, we further collected their friends in turn to get the two-hop social network for each anxiety user. This resulted in a total of 15,297,258 users in the two-hop network of the anxiety users.

Social Interaction Data

Next, we collected the timeline tweets for the 200 anxiety users and their network using the GetOldTweets API as before. For each tweet, we specifically collected the timestamp of the tweet, number of likes (or favorites), whether it is a retweet or quote and if yes, the original tweeter, whether it is a reply and if yes, to whom. Any tweet which is a reply or had @ <username> in it is considered as an interaction tweet. Then we used this stylistic convention of tweets to compile a list of search queries by appending an “@” symbol before the username of each of our anxiety users. This operation provided us, via the GetOldTweets API, with all tweets that were incoming interactions to an anxiety user. Thus, for each such user, alongside the Twitter users who interacted them and the number of such interaction tweets, we also compiled the textual content of these interaction tweets.

2.3 Gathering Control Data

Finally, we describe an approach to gathering “control data”, that is a set of Twitter users without any self-disclosure of anxiety. For the purpose, we collected a matching set of 200 users who did not use any of the anxiety diagnosis and self-disclosure related phrases defined above. For this control dataset, using the methods described above, we collected data on their followers, followings, their two-hop network neighborhood, the tweets shared on their timeline, as well as their social interaction data. In Fig. 1 we present summary distributions of the social interactions and networks over the anxiety and control users.

Fig. 1.
figure 1

Clockwise from top left: Distribution of tweets in the anxiety and control users; Distribution of friends (bidirectional ties) in the anxiety and control users; Distribution of outgoing interactions for anxiety and control users; Distribution of incoming interactions for anxiety and control users.

2.4 Statistical Approach

Building an Anxiety Classifier

Recall, our research question revolves around identifying specific characteristics of an individual’s social network structure and interaction that would indicate whether or not they have an underlying anxiety disorder. To do so, we adopt a supervised learning based binary classification approach, utilizing the Twitter data of the anxiety users and the control users described above. We first present a set of attributes that can be used to characterize the social engagement, social network, social interactions, and social behavior based differences of these two classes. Then we use these attributes as features to build the classifier.

Social Engagement Attributes

We define four measures of engagement inspired from [13]: First, volume which is the normalized number of tweets per day of an (anxiety/control) user (given as the ratio between the total number of tweets of a user to the total number of days of activity of the same user); Second, proportion of reply posts (@- replies) from a user per day which shows her level of social interaction with other Twitter users; Third, the fraction of retweets, indicative of information propagation behaviors, and fourth, the fraction of quotes from a user per day, which signals how they participate in information sharing with their followers. In addition, we also use three more engagement measures differentiating engagement with friends (strong ties) compared to non-friends (weak ties). First, the fraction of replies to friends, second, fraction of retweets whose original tweeters are friends of the user and third fraction of quotes whose original tweeters are friends of the user.

Attributes of Egocentric Social Graph Structure

Next we consider attributes of the egocentric social graph of an (anxiety/control) user. We define a user’s ego-centric graph to be an undirected network of the set of nodes in their two-hop neighborhood (neighbors of the neighbors of the user in our dataset). Our egocentric attributes can be categorized into three types: drawing from the social network literature [13]:

  • Node Properties: We define two attributes that characterize the nature of a user’s egocentric social network. The first is the number of followers or inlinks of a user, the second is the count of their followees or outlinks. Additionally, we define a third feature, which is the count of their friends, which we define as the intersection of the followers and followees of the users (bidirectional links).

  • Dyadic Properties: Here, we define two attributes. Our first attribute is a measure called reciprocity, which is measured as how many times a user u responds to another user v who had sent them @-reply messages. The second attribute is the prestige ratio, and is defined as the ratio of the number of @-replies that are targeted to u, to the number of @-replies targeted to a user v, where v is a user with whom u has bi-directional @-replies.

  • Network Properties: In this category, we define seven attributes. The first is betweenness centrality – this attribute is for quantifying the control of a user on the communication between other users in a social network. The second attribute graph density is the ratio of the count of edges to the count of nodes in u’s egocentric (bidirectional) social network. The third attribute is the clustering coefficient of u’s ego network, which is a notion of local density. The fourth attribute, size of two-hop neighborhood is defined as the count of all of u’s neighbors, plus all of the neighbors of u’s neighbors. We define the fifth attribute embeddedness of u with respect to their neighborhood as the mean of the ratio between the set of common neighbors between u and any neighbor v, and the set of all neighbors of u and v. The sixth attribute in this category is the number of ego components in u’s ego network, defined as the count of the number of connected components that remain when the focal node u and its incident edges are removed [21]. The final attribute of this category is the normalized average size of the ego components of a user u in their ego network.

Egocentric Interaction Graph Attributes

Moving from social engagement and social network structure to social interactions, we further define a number of egocentric interaction graph attributes based the interactions of an anxiety or control user with others on Twitter (through @-replies). For the interaction graph, an edge between u and v implies that there has been at least one @-reply exchange each, from u to v, and from v to u. The interaction network is a subgraph of the egocentric social network graph as we consider interactions of a user with only friends of the user and the interactions of these friends with their own friends in turn, creating a two-hop interaction network of the anxiety/control users. We define the following specific attributes:

  • Unsigned Network Properties: Here, we define seven attributes, similar to the network properties defined for the social network structure above.

  • Signed Network Properties: While the general structure of a social network can be useful for a problem domain like ours, individuals often share rich relationships with their peers, which cannot necessarily be captured via simple pairwise unsigned links. To take into account both positive and negative pairwise interactions between individuals, we define attributes drawing upon the literature in modeling signed network properties [22]. For developing such a signed network, we assign polarity (positive or negative) to each edge of the interaction network. Specifically, we perform a sentiment analysis of all the interaction between a user u with a friend v, using an ensemble approach that combines the outcomes given by tools such as VADER [23], Stanford CoreNLP [24], and the NLTK library. Depending on whether there are more positive or negative interactions we define the net edge as positive or negative tie. From this signed network of every user, we define four attributes. The first measure is the fraction of negative ties. The second measure is the ratio of number of balanced triads to the total number of triads in the network. Balanced triad is a set of three connected users in which there is an odd count of positive edges (i.e., one or three positive edges) as shown in Fig. 2. The third attribute is the average degree of the nodes having negative ties with the user u. In order to understand how the users feel about their family and close relations, we defined the final attribute as the fraction of family category tweets which shows negative sentiment. To detect the tweets of family category we checked for the presence of ‘family’ category terms in the psycholinguistic lexicon Linguistic Inquiry and Word Count (LIWC) [25].

    Fig. 2.
    figure 2

    adapted from [22].

    Structural representation of balanced (a, c) and unbalanced signed triads (b, d),

Social Behavioral Attributes

Finally, we consider four attributes of the emotional state of users in our dataset: positive affect (PA), negative affect (NA), anger, and sadness, drawing from prior work on psycholinguistics, mental health, and social media. These works have revealed that an individual’s behaviors towards their friends and peers are often reflective of their underlying psychologies [25]. Daily measurements of these attributes per user were computed using LIWC. We also sought to measure social behaviors through a user’s expression of linguistic style in their posts. For this, we again use LIWC, focusing on 22 specific linguistic style categories, drawing on prior research [25]: articles, auxiliary verbs, conjunctions, adverbs, personal pronouns, prepositions, functional words, assent, negation, certainty and quantifiers.

Classification Framework

Using the above-defined features, we use supervised learning to construct classifiers trained to predict anxiety in our two user classes. We compare several different parametric and non-parametric binary classifiers (Gaussian Process classifier, Decision Tree classifier, Random Forest classifier, Multi-Layer Perceptron classifier, Adaboost classifier, Gaussian Naive Bayes classifier, Logistic Regression classifier and Support Vector Machine or SVM classifier) to empirically determine the best suitable classification technique. In order to understand the importance of various feature types, for each classifier, we trained one model each using each category of attributes defined above. Additionally, we built a fifth classification model using dimensionality-reduced set of all features. The positive and negative examples for these classifiers came from the set of the 200 anxiety users and 200 control users respectively. We used k-fold (k = 5) cross validation alongside a 20% held out dataset to tune and then test our models.

3 Results

We begin by presenting the results of our best performing classifier, an SVM. The results are presented in Table 2. We find that the best performing model in the test set(s) yields an average accuracy of 79% (σ = 2.3 across the five cross validation folds) and high precision of 0.86, corresponding to the anxiety class. Note that a baseline chance model would yield accuracy of 50%, since the positive and negative classes are balanced. Good performance of this classifier is also evident from the receiver-operator characteristic (ROC) curves shown in in Fig. 3.

Table 2. Performance metrics of anxiety classification
Fig. 3.
figure 3

ROC curve for the SVM based anxiety classification model.

We see that the model using the social behavioral attributes performs the best among all of the models. Results in prior literature suggest that use of linguistic styles such as pronouns and articles provide information about how individuals respond to psychological triggers [26]. Finally, the better performance of interaction network features than ego-network features shows that the friends with whom users interact with, bear more predictive cues to their psychological well-being (anxiety status) than just the structure of the social network in which they are embedded. We can conclude that social media network structure and interactions together with social behavioral attributes provide useful signals that can be utilized to classify if an individual is suffering from anxiety.

Based on the performance of the different classifiers above, we present some analyses of differences in the two classes of users in Table 3.

Table 3. Average values and differences of significant attributes for the anxiety and control groups (*** p < 0.001 following Bonferroni correction) for all the models.

First, corresponding to the attributes relating to social engagement, we see that although is there no significant difference in the total volume (tweets, retweets, quotes and replies) of the posts in these two categories, the anxiety class shows marked higher volume of replies and low the volume of retweets. This suggests that people suffering from anxiety find it more comfortable interacting with other people instead of generating content. Further, the ratio of replies to non-friends compared to friends for the anxiety class is significantly higher than the control group. Non-friends are usually people who the users follow like celebrities, but they typically do not follow the users back as they might not know the users, whereas friends are the people who follow each other and know each other in some context. This suggests that users with anxiety may find it easier to interact where they have no preconceived image to maintain of themselves. People suffering from anxiety further show significantly less number of followers and followees but no significant difference in the number of friends. This indicates reduced desire to consume content beyond the known social sphere. This combined with less interactions in the form of replies with friends indicate that users prefer to consume the content produced by their friends in a passive manner, probably to know what they are doing, rather than directly communicating with them.

In the egocentric social graph attributes category, for dyadic properties, anxiety users also show reduced reciprocity to others’ communications, indicating decreased desire for social interaction with friends. Since interaction with non-friends is not bidirectional, these do not contribute to building up the reciprocity of a user in our analysis.

For the egocentric interaction graph attributes, we observe that the network properties become pronounced between the two user classes. Anxiety users show higher number of ego components, higher betweenness centrality, lower clustering coefficient, higher size of two-hop neighborhood, and lower embeddedness. Lower embeddedness indicates that the anxiety users have lower number of mutual friends with their neighbors in their interaction graph. Next, anxiety users show significantly more number of negative ties, however the number of balanced triads shows no significant difference between the two groups. This implies although the anxiety users have stable networks (balanced triads), they have high negativity, expressed in the shared tweets, in their network. The anxiety users also show higher average degree of negative interaction ties suggesting that they have more negative ties of higher status in their social circles. Our next feature, the proportion of tweets expressing negative sentiment around family topics is also significantly higher for the anxiety users compared to the control group.

In the category of social behavioral attributes, we see significantly higher use of first person singular pronouns for the anxiety users. Increased self-attentional focus is known to indicate weak psychological functioning [13, 25]. We also observe higher use of social and health words which implies higher social and personal concern among the anxiety users. Additionally, for the anxiety users, we see greater emotional expressivity: higher positive affect, negative affect, swear and sadness. Studies in the psychology literature [27] has observed that individuals suffering from anxiety disorders experience mood swings and these results confirm their expression in social media.

4 Discussion

4.1 Summary of Principal Results

This research offers many interesting and valuable findings that augment theoretical and empirical insights in prior work. First, we found that although weak ties are a hallmark of social media platforms like Twitter [28], individuals were indeed more affected by their interactions with the strong ties (or friends); when these interactions were negative, it resulted in greater risk to anxiety. Similarly, negative interactions centered around topics of close relationships like family were a significant contributor to anxiety. At the same time, the fact that anxiety users in our dataset were part of smaller, more disconnected diverse sub-networks goes on to show that, on the one hand, online social integration, like its offline counterpart, is limited in this population. But on the other hand, it questions to what extent vulnerable individuals are able to mobilize the social connectivity functionalities of social media to find access to larger communities of peers and support groups.

4.2 Comparison with Prior Work

As noted earlier, considerable work has been conducted to understand how various social environmental factors impact anxiety disorder risk or how anxiety disorders manifest in an individual’s social interactions [4, 8]. However, whether these findings hold true for online social interactions as well has been less studied. Given the pervasive adoption of social media and their integration in daily lives, an understanding the relationship of online interactions with anxiety, is vital and our work has attempted to fill this gap.

Social Predictors of Anxiety

A rich body of literature exists in psychiatry and psychology to understand the causes and characteristics of anxiety disorders [6, 32]. Importantly, as noted above, social environmental factors can act as stressors to anxiety disorders. In fact, the link between social isolation and reduced psychological well-being is well established in sociology, dating back to Durkheim [30]. Smaller social networks, fewer close relationships, and lower perceived adequacy of social support have been linked to symptoms of anxiety [5, 8, 31].

Despite this attention, empirical work examining the impact of social interactions on anxiety experiences has faced significant challenges [8, 33, 34]. Our results showed empirical evidence in support of the existing theories, while mitigating some of the challenges of empirical work that relies heavily on retrospective recall and self-reports of social networks. Further, we evaluated whether online social network attributes like presence or absence of close-knit groups, high or low clustering in a person’s ego-centric social network, or positive or negative social ties can be indicative of anxiety, which in themselves are novel findings in this research area.

Mental Health and Social Media

It has been recognized that people share content about their emotional health on social media [35, 36], and there is a rich and growing body of work in social media research to develop quantitative and computational methods to understand various forms of mental and behavioral health and wellness states [37], like depression [13, 38,39,40,41], suicidal ideation [14, 42, 43], post traumatic stress disorder [44], schizophrenia [45, 46], social anxiety [47,48,49,50], and substance use [51, 52]; see Guntuku et al. [53] for a review.

However, we note that gaps exist in our understanding of the relationship of online social interactions and online social networks with the mental health of social media users. While existing works have largely established the value of social media language in helping characterize and even predict risk to various mental illnesses, the important role of social connectedness on these platforms on one’s mental health experiences is relatively unexplored. As noted in a systematic review by Park et al. [55], others by Seabrook et al. [56] and by Dobrean and Pasarelu [57], social media use has been reported to be associated with lower levels of loneliness and greater belonging, social capital, and actual and perceived access to social support and is generally associated with higher levels of life satisfaction and self-esteem [58,59,60]. As a whole, the positive social components of social media (and broadly, Internet) use has been argued to serve a protective role against depression and anxiety [15, 16], although the converse – how negative online social interactions and specific network structures impact the risk of mental illnesses has remained an open question for investigation. Further, we note that anxiety disorders, as a mental illness [54], has received relatively less attention in the social media literature, with the exception of Tian et al. [61], who provided a qualitative thematic analysis of a random sample of 1000 anxiety-related postings on Sina Weibo. This work found anxiety-disclosures to be the biggest theme, but did not explore, in computational, scalable, or other capacity, the role of social connectedness in this illness.

Inspired by literature in psychology that have explored the impact of negative social ties and negative interactions on mental health of individuals (ref. above), we developed a variety of computational techniques to first characterize the social ties of anxiety sufferers as positive or negative and then assess their association with individual anxiety. Thereby, we have complemented the analyses in the above social media work—where the focus has largely been on identifying linguistic markers—to identifying what social attributes and social behaviors are associated with mental illnesses, particularly anxiety disorders.

4.3 Limitations and Future Work

There are some limitations to our work. We acknowledge an inherent population bias in our dataset. Not all demographic groups use Twitter and not all Twitter users self-disclose or share information indicative of their mental health (anxiety) status. Our observations are limited by individuals who choose to discuss about their anxiety on Twitter, maintain a social network with others on the platform, and engage in substantial interactions with them. Finally, in this work we did not obtain clinical validation on either the anxiety classification results or the empirical observations linking social interactions and anxiety. Future work can explore these opportunities through interdisciplinary collaborations as well as by gathering self-reported data. Last but certainly not the least, despite working with public Twitter data in this paper, future work, in the light of recent research [29], should also include discussions of people’s privacy perceptions and ethical concerns around deriving such (sensitive) mental health assessments from social media data.

Nevertheless, our work offers several practical implications for instrumenting social media platforms to support the experiences of individuals with or at risk of anxiety disorders, which constitute promising directions for future research. Recall that we found that, although individuals suffering from anxiety disorders have more negative ties, the fraction of balanced triads in their network structure is not significantly different from the control group. Based on this finding, online platforms like Twitter can help users who self-disclose about their anxiety build and enrich their social networks. Platforms can suggest connections which are already observed to have positive interactions with other social ties of the user, or has negative interactions with the already existing negative ties of the user. Since individuals tend to reach out to weak ties in the broader online community during times of heightened anxiety, platforms can also enable provisions to recommend specific groups or lists where individuals can be less concerned about impression management and can engage in more disinhibiting discourse about their condition and experiences.

5 Conclusion

In this paper, we presented a large-scale data-based study examining the online social network and interaction characteristics of Twitter users who self-disclose about their anxiety disorders. We found that these attributes of network and interaction can be powerful in identifying, via a supervised learning based classifier, those at risk of this condition or those experiencing this condition, as self-reported on the Twitter platform. Our work provides one of the first results situating the relationship between online social interactions and networks, and anxiety disorders, and situated the important value of weak social ties prevalent on online platforms in understanding people’s mental health experiences.