Analyzing gender tropes in film with screen direction from 2,000 scripts.
In April 2016, we broke down film dialogue by gender. The essay presented an imbalance in which men delivered more lines than women across 2,000 screenplays. But quantity of lines is only part of the story. What characters do matters, too.
Gender tropes (e.g., women are pretty/men act, men don’t cry) are just as important as dialogue in understanding how men and women are portrayed on-screen. These stereotypes result from many components, including casting, acting, directing, etc.
The code used in analysis is publicly available on GitHub. The data set for this analysis included 1,966 scripts for films released between 1929 and 2015; most are from 1990 and after. Each script was processed to extract only the screen directions, excluding dialogue from this analysis. We then identified all bigrams in these scripts that had either “he” or “she” as the first word in the bigram.
Then, we calculated a log odds ratio to find words that exhibit the biggest differences between relative use for “she” and “he.” We removed stop words and did some other minimal text cleaning to maintain meaningful results. We calculated the overall log odds ratio for the 800 most commonly used words, and then log odds ratios for scripts with only male writers and female writers for the 400 most commonly used words. Scripts often have more than one writer and could be counted in both categories. To learn more about text mining analyses like this one and how to perform them, check out Julia’s book.