Analyzing the perceived humanness of AI-generated social media content around the presidential debate


Dr. Tiago Ventura

Assistant Professor of Computational Social Science at Georgetown, researching politics and social media, with focus on content propagation, misinformation, and political behavior.

X/Twitter: @_Tiagoventura
Email: tv186@georgetown.edu



Rebecca Ansell

M.S. Computer Science candidate and MDI Scholar at Georgetown, researching humanness and misinformation detection on social media during major political events.

Email: rja80@georgetown.edu



Dr. Sejin Paik

Postdoctoral Fellow at Georgetown’s Massive Data Institute, researching AI in Journalism, Political Psychology, Human-Centered AI, and Intelligent Social Systems for trustworthy AI development.

X/Twitter: @sejinpaik
Email: sp1822@georgetown.edu



Autumn Toney

PhD student in Computer Science at Georgetown and data scientist, researching NLP, domain-specific LLM applications, and bibliometrics in large text corpora.

Email: autumn.toney@georgetown.edu



Prof. Leticia Bode

Professor of Communication at Georgetown and Research Director for Knight-Georgetown Institute, studying communication technology’s impact on information use, effects, and misinformation.

X/Twitter: @leticiabode
Email: lb871@georgetown.edu



Prof. Lisa Singh

Sonneborn Chair, Professor of Computer Science and Public Policy, and Director at Georgetown’s Massive Data Institute, with 100+ publications in data-centric computing.

Email: los4@georgetown.edu


U.S. Election 2024

64. Reversion to the meme: A return to grassroots content (Dr Jessica Baldwin-Philippi)
65. From platform politics to partisan platforms (Prof Philip M. Napoli, Talia Goodman)
66. The fragmented social media landscape in the 2024 U.S. election (Dr Michael A. Beam, Dr Myiah J. Hutchens, Dr Jay D. Hmielowski)
67. Outside organization advertising on Meta platforms: Coordination and duplicity (Prof Jennifer Stromer-Galley)
68. Prejudice and priming in the online political sphere (Prof Richard Perloff)
69. Perceptions of social media in the 2024 presidential election (Dr Daniel Lane, Dr Prateekshit “Kanu” Pandey)
70. Modeling public Facebook comments on the attempted assassination of President Trump (Dr Justin Phillips, Prof Andrea Carson)
71. The memes of production: Grassroots-made digital content and the presidential campaign (Dr Rosalynd Southern, Dr Caroline Leicht)
72. The gendered dynamics of presidential campaign tweets in 2024 (Prof Heather K. Evans, Dr Jennifer Hayes Clark)
73. Threads and TikTok adoption among 2024 congressional candidates in battleground states (Prof Terri L. Towner, Prof Caroline Muñoz)
74. Who would extraterrestrials side with if they were watching us on social media? (Taewoo Kang, Prof Kjerstin Thorson)
75. AI and voter suppression in the 2024 election (Prof Diana Owen)
76. News from AI: ChatGPT and political information (Dr Caroline Leicht, Dr Peter Finn, Dr Lauren C. Bell, Dr Amy Tatum)
77. Analyzing the perceived humanness of AI-generated social media content around the presidential debate (Dr Tiago Ventura, Rebecca Ansell, Dr Sejin Paik, Autumn Toney, Prof Leticia Bode, Prof Lisa Singh)

While many raised concerns about generative AI influencing elections in 2024, the impact on the United States 2024 elections was likely minimal. However, how individuals perceive the content they encounter online is currently unknown–can the general public effectively identify AI-generated content?

We analyzed how people perceive election-related content they might encounter online, particularly when this content is generated by AI (without being labeled as such). We selected two widely-used public social media platforms in the U.S. for news consumption, X (formerly Twitter) and YouTube. We focused on content discussing the U.S. presidential debate that took place on September 10th, 2024, the only debate between the final presidential candidates from the two major American political parties. 

The American public’s perception of AI-generated content around the election is mixed, but leans toward caution. A 2024 Pew Research Report showed that while use of generative AI chatbot tools like ChatGPT is on the rise, nearly 40% of Americans express low to no trust in election information from ChatGPT (even less when excluding those that haven’t heard of the tools). Despite this general skepticism, incidents like the fabricated hurricane alerts in October 2024 illustrate a paradox: even when aware of the potential for inaccuracy, people sometimes believe the AI-generated misinformation. This tension between skepticism and susceptibility suggests a need to further understand how effectively people can identify AI-generated content. 

Method

To explore this dynamic, we recruited 504 online workers (via Connect) to annotate 7,500 pairwise comparisons of real content collected from X/Twitter and YouTube and content generated from GPT-4o (OpenAI’s current, freely available model) discussing the 2024 U.S. Presidential Debate. We collected real posts from X/Twitter (mentioning #DebateNight and #Debate2024) and real YouTube comments from 10 debate recap videos. Then, we instructed GPT-4o to generate similar posts in the voice of five different political personas (e.g., a liberal commentator with a left-leaning political stance in the U.S.) based on the debate transcript. To build the content pairs for annotation, we sampled 250 posts from the platforms, with an equal split between X/Twitter and YouTube, and 500 posts from the generated data, with an equal split between the platforms and the personas. Holding the platform constant, each post was randomly paired with another 10 posts, generating a total of 7,500 pairs. 

Using the Bradley-Terry scaling statistical model to measure latent “ability” from pairwise contests, we estimated a perceived humanness score, normalized from -1 to 1, where scores closer to -1 indicated stronger perceptions of human origin and closer to 1 suggested AI generation. This approach for scaling human perceptions based on pairwise contexts has been widely used in other social science tasks, such as measuring the persuasiveness of argumentsideological scaling of politicians, and textual complexity. This method provides a nuanced understanding of how individuals perceive the authenticity of election-related content online.

Results and discussion

We present the density distribution of the humanness scores, separated by platform and the source of the text, in Figure 1. Our most critical finding indicates that participants could generally distinguish between human-authored versus AI-generated content on both X/Twitter and YouTube. However, the platform context appears to significantly influence this ability.  The separation between human and AI content was more blurred for YouTube than X/Twitter. One potential explanation is the inherent difference in content type: Youtube features comments that respond to videos, while X/Twitter focuses on original posts. This distinction may shape how users perceive content, as comments on Youtube often adopt a more informal, verbose and free-form writing style (See Table 1 for the differences in average length of posts). 

Comparative Density Distribution of Humanness Scores Across Platforms[1]
Descriptive Statistics of Word Counts Across Platforms and Authors

Next, we examine the effects of semantic and emotional features, classified with the TweetEval pre-trained model, on human perceptions. TweetEval’s emotion detection is trained using the affect of tweets that corresponds to human experience. Figure 2 presents the marginal effects of these features using a linear mixed model regressing the humanness scores on these textual features. Content exhibiting positive sentiment was more frequently perceived as AI-authored, while content containing negative sentiment was more likely to be perceived as human.  Offensive language also served as an indicator of human authorship, while irony and hateful speech were less distinguishing features. Strong emotional markers like “joy” were indications of humanness, while “sadness” was linked to AI-authored content. These patterns suggest that humans may use text tone, civility, and emotion to identify AI-generated content. 

Pooled Marginal Effects of Associated Sentiment and Emotion in Text on Perceived Humanness Scores[2]

Our results indicate that humans can generally differentiate between AI-generated and human-authored content, particularly by relying on tone, civility, and emotion markers. Negative sentiment, offensive language, and “joy” tended to be associated with human authorship, while positive sentiment and “sadness” were associated with AI-generated content. This suggests that public concerns about AI’s ability to fully replicate human nuances in text might be somewhat overblown, at least with the current generation of tools available.


[1] Note: The density plots reveal how participants perceived the humanness of real and AI-generated posts on X and YouTube. Each plot shows two distributions: one for real content and one for AI-generated content. The x-axis represents a “humanness score” ranging from -1 to 1, where scores closer to -1 indicate stronger perceptions of human origin, and scores closer to 1 indicate stronger perceptions of AI generation.

[2] Note: Point-Estimates Presented with 90% and 95% Confidence Intervals. To estimate the marginal effects, we use a linear mixed-effects model using a random intercept for the platforms (Twitter/X and YouTube).  We classified the sentiment and emotions in the text using TweetEval Pre-Trained Model (TweetEval)