Penn Psychologists Tap Big Data, Twitter to Analyze Accuracy of Stereotypes

What’s in a tweet? People draw conclusions about us, from our gender to education level, based on the words we use on social media. Researchers from the University of Pennsylvania, along with colleagues from the Technical University of Darmstadt and the University of Melbourne, have now analyzed the accuracy of those inferences. Their work revealed that, though stereotypes and the truth often aligned, with people making accurate assumptions more than two-thirds of the time, inaccurate characterizations still showed up.



People associate certain behaviors with certain social groups. These stereotypical beliefs consist of both accurate and inaccurate associations. Using large-scale, data-driven methods with social media as a context, we isolate stereotypes by using verbal expression. Across four social categories—gender, age, education level, and political orientation—we identify words and phrases that lead people to incorrectly guess the social category of the writer. Although raters often correctly categorize authors, they overestimate the importance of some stereotype-congruent signal. Findings suggest that data-driven approaches might be a valuable and ecologically valid tool for identifying even subtle aspects of stereotypes and highlighting the facets that are exaggerated or misapplied.

Author Biographies

John Carpenter is a postdoctoral researcher at the Kenan Institute for Ethics at Duke University. He received his PhD in social psychology from the University of North Carolina in 2013.

Daniel Preotiuc-Pietro is a postdoctoral researcher in natural language processing working for the World Well-Being Project in the Positive Psychology Center of the University of Pennsylvania. His current research leverages large-scale social media footprints to aid with psychology and health problems.

Lucie Flekova is a PhD Student in natural language processing at the Department of Computer Science, Technische Universität Darmstadt. She focuses on stylistic and semantic analysis of text with applications in author profiling.

Salvatore Giorgi is a research programmer at the World Well-Being Project at the University of Pennsylvania.

Courtney Hagan is a research assistant with the World Well-Being Project.

Margaret L. Kern is a senior lecturer at the Centre for Positive Psychology at the University of Melbourne’s Graduate School of Education. Her research examines the question of who flourishes in life (physically, mentally, and socially), why, and what enhances or hinders healthy life trajectories.

Anneke E. K. Buffone currently is the lead research scientist and a postdoctoral fellow at the University of Pennsylvania’s World Well-Being Project. Buffone’s research specializes in other-focused motivations, emotions, and cognitions and its effects on social interactions.

Lyle Ungar is a professor of computer and information science at the University of Pennsylvania, where he also holds appointments in other departments in the schools of Engineering, Arts and Sciences, Medicine, and Business. His current research interests include machine learning, text mining, statistical natural language processing, and psychology.

Martin E. P. Seligman is the Zellerbach Family Professor of psychology and director of the Positive Psychology Center at the University of Pennsylvania, where he focuses on positive psychology, learned helplessness, depression, and optimism.