What do Twitter comments tell about news article bias? : Assessing the impact of news article bias on its perception on Twitter
2023, Spinde, Timo, Richter, Elisabeth, Wessel, Martin, Kulshrestha, Juhi, Donnay, Karsten
News stories circulating online, especially on social media platforms, are nowadays a primary source of information. Given the nature of social media, news no longer are just news, but they are embedded in the conversations of users interacting with them. This is particularly relevant for inaccurate information or even outright misinformation because user interaction has a crucial impact on whether information is uncritically disseminated or not. Biased coverage has been shown to affect personal decision-making. Still, it remains an open question whether users are aware of the biased reporting they encounter and how they react to it. The latter is particularly relevant given that user reactions help contextualize reporting for other users and can thus help mitigate but may also exacerbate the impact of biased media coverage. This paper approaches the question from a measurement point of view, examining whether reactions to news articles on Twitter can serve as bias indicators, i.e., whether how users comment on a given article relates to its actual level of bias. We first give an overview of research on media bias before discussing key concepts related to how individuals engage with online content, focusing on the sentiment (or valance) of comments and on outright hate speech. We then present the first dataset connecting reliable human-made media bias classifications of news articles with the reactions these articles received on Twitter. We call our dataset BAT - Bias And Twitter. BAT covers 2,800 (bias-rated) news articles from 255 English-speaking news outlets. Additionally, BAT includes 175,807 comments and retweets referring to the articles. Based on BAT, we conduct a multi-feature analysis to identify comment characteristics and analyze whether Twitter reactions correlate with an article’s bias. First, we fine-tune and apply two XLNet-based classifiers for hate speech detection and sentiment analysis. Second, we relate the results of the classifiers to the article bias annotations within a multi-level regression. The results show that Twitter reactions to an article indicate its bias, and vice-versa. With a regression coefficient of 0.703 (), we specifically present evidence that Twitter reactions to biased articles are significantly more hateful. Our analysis shows that the news outlet’s individual stance reinforces the hate-bias relationship. In future work, we will extend the dataset and analysis, including additional concepts related to media bias.
Is sharing just a function of viewing? : The sharing of political and non-political news on Facebook
2022-07-12, Trilling, Damian, Kulshrestha, Juhi, De Vreese, Claes, Halagiera, Denis, Jakubowski, Jakub, Möller, Judith, Puschmann, Cornelius, Stępińska, Agnieszka, Stier, Sebastian, Vaccari, Cristian
How is political news shared online? This fundamental question for political communication research in today’s news ecology is still poorly understood. In particular, very little is known about whether and how news sharing differs from news viewing. Based on a unique dataset of ≈ 870,000 URLs shared ≈ 100 million times on Facebook, grouped by countries, age brackets, and months, we study the correlates of viewing versus sharing of political versus non-political news. We first identify websites that at least occasionally contain news items, and then analyze metrics of the news items published on these websites. We enrich the dataset with natural language processing and super- vised machine learning. We find that political news items are viewed less than non-political news items, but are shared more than one would expect based on their views. Furthermore, the source of a news item and textual features, which are often studied in clickbait research and in commercial A/B testing, matter. Our findings are conditional on age, but are very similar across four different countries (Italy, Germany, Netherlands, Poland). While our research design does not allow for causal claims, our findings suggest that future work is well-advised to both theoretically and methodologically differentiate between factors that may explain (a) viewing versus sharing of news, and (b) political versus non-political news.
Misinformation, believability, and vaccine acceptance over 40 countries : Takeaways from the initial phase of the COVID-19 infodemic
2022, Singh, Karandeep, Lima, Gabriel, Cha, Meeyoung, Cha, Chiyoung, Kulshrestha, Juhi, Ahn, Yong-Yeol, Varol, Onur
The COVID-19 pandemic has been damaging to the lives of people all around the world. Accompanied by the pandemic is an infodemic, an abundant and uncontrolled spread of potentially harmful misinformation. The infodemic may severely change the pandemic's course by interfering with public health interventions such as wearing masks, social distancing, and vaccination. In particular, the impact of the infodemic on vaccination is critical because it holds the key to reverting to pre-pandemic normalcy. This paper presents findings from a global survey on the extent of worldwide exposure to the COVID-19 infodemic, assesses different populations' susceptibility to false claims, and analyzes its association with vaccine acceptance. Based on responses gathered from over 18,400 individuals from 40 countries, we find a strong association between perceived believability of COVID-19 misinformation and vaccination hesitancy. Our study shows that only half of the online users exposed to rumors might have seen corresponding fact-checked information. Moreover, depending on the country, between 6% and 37% of individuals considered these rumors believable. A key finding of this research is that poorer regions were more susceptible to encountering and believing COVID-19 misinformation; countries with lower gross domestic product (GDP) per capita showed a substantially higher prevalence of misinformation. We discuss implications of our findings to public campaigns that proactively spread accurate information to countries that are more susceptible to the infodemic. We also defend that fact-checking platforms should prioritize claims that not only have wide exposure but are also perceived to be believable. Our findings give insights into how to successfully handle risk communication during the initial phase of a future pandemic.
Empirical Evaluation of Three Common Assumptions in Building Political Media Bias Datasets
2020, Ganguly, Soumen, Kulshrestha, Juhi, An, Jisun, Kwak, Haewoon
In this work, we empirically validate three common assumptions in building political media bias datasets, which are (i) labelers' political leanings do not affect labeling tasks, (ii) news articles follow their source outlet's political leaning, and (iii) political leaning of a news outlet is stable across different topics. We build a ground-truth dataset of manually annotated article-level political leaning and validate the three assumptions. Our findings warn that the three assumptions could be invalid even for a small dataset. We hope that our work calls attention to the (in)validity of common assumptions in building political media bias datasets.
A Domain-adaptive Pre-training Approach for Language Bias Detection in News
2022, Krieger, Jan-David, Spinde, Timo, Ruas, Terry, Kulshrestha, Juhi, Gipp, Bela
Media bias is a multi-faceted construct influencing individual behavior and collective decision-making. Slanted news reporting is the result of one-sided and polarized writing which can occur in various forms. In this work, we focus on an important form of media bias, i.e. bias by word choice. Detecting biased word choices is a challenging task due to its linguistic complexity and the lack of representative gold-standard corpora. We present DA-RoBERTa, a new state-of-the-art transformer-based model adapted to the media bias domain which identifies sentence-level bias with an F1 score of 0.814. In addition, we also train, DA-BERT and DA-BART, two more transformer models adapted to the bias domain. Our proposed domain-adapted models outperform prior bias detection approaches on the same data.
Prevalence of Misinformation and Factchecks on the COVID-19 Pandemic in 35 Countries : Observational Infodemiology Study
2021-02-13, Cha, Meeyoung, Cha, Chiyoung, Singh, Karandeep, Lima, Gabriel, Ahn, Yong-Yeol, Kulshrestha, Juhi, Varol, Onur
The COVID-19 pandemic has been accompanied by an infodemic, in which a plethora of false information has been rapidly disseminated online, leading to serious harm worldwide.
This study aims to analyze the prevalence of common misinformation related to the COVID-19 pandemic.
We conducted an online survey via social media platforms and a survey company to determine whether respondents have been exposed to a broad set of false claims and fact-checked information on the disease.
We obtained more than 41,000 responses from 1257 participants in 85 countries, but for our analysis, we only included responses from 35 countries that had at least 15 respondents. We identified a strong negative correlation between a country’s Gross Domestic Product per-capita and the prevalence of misinformation, with poorer countries having a higher prevalence of misinformation (Spearman ρ=–0.72; P<.001). We also found that fact checks spread to a lesser degree than their respective false claims, following a sublinear trend (β=.64).
Our results imply that the potential harm of misinformation could be more substantial for low-income countries than high-income countries. Countries with poor infrastructures might have to combat not only the spreading pandemic but also the COVID-19 infodemic, which can derail efforts in saving lives.
Where the earth is flat and 9/11 is an inside job : A comparative algorithm audit of conspiratorial information in web search results
2022-08, Urman, Aleksandra, Makhortykh, Mykola, Ulloa, Roberto, Kulshrestha, Juhi
Web search engines are important online information intermediaries that are frequently used and highly trusted by the public despite multiple evidence of their outputs being subjected to inaccuracies and biases. One form of such inaccuracy, which so far received little scholarly attention, is the presence of conspiratorial information, namely pages promoting conspiracy theories. We address this gap by conducting a comparative algorithm audit to examine the distribution of conspiratorial information in search results across five search engines: Google, Bing, DuckDuckGo, Yahoo and Yandex. Using a virtual agent-based infrastructure, we systematically collect search outputs for six conspiracy theory-related queries (“flat earth”, “new world order”, “qanon”, “9/11”, “illuminati”, “george soros”) across three locations (two in the US and one in the UK) and two waves (March and May 2021). We find that all search engines except Google consistently displayed conspiracy-promoting results and returned links to conspiracy-dedicated websites, with variations across queries. Most conspiracy-promoting results came from social media and conspiracy-dedicated websites while conspiracy-debunking information was shared by scientific websites and legacy media. These observations are consistent across different locations and time periods highlighting the possibility that some engines systematically prioritize conspiracy-promoting content.
Analyzing Biases in Perception of Truth in News Stories and Their Implications for Fact Checking
2022, Babaei, Mahmoudreza, Kulshrestha, Juhi, Chakraborty, Abhijnan, Redmiles, Elissa M., Cha, Meeyoung, Gummadi, Krishna P.
Misinformation on social media has become a critical problem, particularly during a public health pandemic. Most social platforms today rely on users' voluntary reports to determine which news stories to fact-check first. Despite the importance, no prior work has explored the potential biases in such a reporting process. This work proposes a novel methodology to assess how users perceive truth or misinformation in online news stories. By conducting a large-scale survey (N = 15,000), we identify the possible biases in news perceptions and explore how partisan leanings influence the news selection algorithm for fact checking. Our survey reveals several perception biases or inaccuracies in estimating the truth level of stories. The first kind, called the total perception bias (TPB), is the aggregate difference in the ground truth and perceived truth level. The next two are the false-positive bias (FPB) and false-negative bias (FNB), which measures users' gullibility and cynicality of a given claim. We also propose ideological mean perception bias (IMPB), which quantifies a news story's ideological disputability. Collectively, these biases indicate that user perceptions are not correlated with the ground truth of new stories; users believe some stories to be more false and vice versa. This calls for the need to fact-check news stories that exhibit the most considerable perception biases first, which the current voluntary reporting does not offer. Based on these observations, we propose a new framework that can best leverage users' truth perceptions to remove false stories, correct misperceptions of users, or decrease ideological disagreements. We discuss how this new prioritizing scheme can aid platforms to significantly reduce the impact of fake news on user beliefs.
Web Routineness and Limits of Predictability : Investigating Demographic and Behavioral Differences Using Web Tracking Data
2021, Kulshrestha, Juhi, Oliveira, Marcos, Karacalik, Orkut, Bonnay, Dennis, Wagner, Claudia
Understanding human activities and movements on the Web is not only important for computational social scientists but can also offer valuable guidance for the design of online systems for recommendations, caching, advertising, and personalization. In this work, we demonstrate that people tend to follow routines on the Web, and these repetitive patterns of web visits increase their browsing behavior's achievable predictability. We present an information-theoretic framework for measuring the uncertainty and theoretical limits of predictability of human mobility on the Web. We systematically assess the impact of different design decisions on the measurement. We apply the framework to a web tracking dataset of German internet users. Our empirical results highlight that individual's routines on the Web make their browsing behavior predictable to 85% on average, though the value varies across individuals. We observe that these differences in the users' predictabilities can be explained to some extent by their demographic and behavioral attributes.