She Giggles, He Gallops - The Pudding

Analyzing gender tropes in film with screen direction from 2,000 scripts.

By Julia Silge

In April 2016, we broke down film dialogue by gender. The essay presented an imbalance in which men delivered more lines than women across 2,000 screenplays. But quantity of lines is only part of the story. What characters do matters, too.

Gender tropes (e.g., women are pretty/men actmen don’t cry) are just as important as dialogue in understanding how men and women are portrayed on-screen. These stereotypes result from many components, including casting, acting, directing, etc.

The film script, arguably, is ground zero—the source material by which everyone is influenced. And in film scripts, there’s dialogue and screen direction. For example, let’s take this iconic scene from Titanic:


Rose gasps. There is nothing in her field of vision but water...She leans forward, arching her back. He puts his hands on her waist to steady her.

Rose closes her eyes...she smiles dreamily, then leans back, gently pressing her back against his chest. He pushes forward slightly against her.

The curious data here is less what Rose says (“I’m flying”) and more what the screen direction prescribes (“she smiles dreamily,” “he pushes against her”). In the following analysis, we go deep on screen direction to understand gender tropes. We examined 2,000 scripts and broke down every screen direction mapped to the pronouns “she” and “he.”

Read the full article , with some wonderful infographics covering:


The most used words for women vs. men

Likelihood that certain words appear after “she” vs. “he” in screen direction.

The top 800 words paired with “she” or “he”

Underlined words contain examples of their usage in screen direction.

Comparing female vs. male writers

Words far away from an axis exhibit more dramatic differences. Bigger circles indicate words that are used more often.



The code used in analysis is publicly available on GitHub. The data set for this analysis included 1,966 scripts for films released between 1929 and 2015; most are from 1990 and after. Each script was processed to extract only the screen directions, excluding dialogue from this analysis. We then identified all bigrams in these scripts that had either “he” or “she” as the first word in the bigram.

Then, we calculated a log odds ratio to find words that exhibit the biggest differences between relative use for “she” and “he.” We removed stop words and did some other minimal text cleaning to maintain meaningful results. We calculated the overall log odds ratio for the 800 most commonly used words, and then log odds ratios for scripts with only male writers and female writers for the 400 most commonly used words. Scripts often have more than one writer and could be counted in both categories. To learn more about text mining analyses like this one and how to perform them, check out Julia’s book.

Writers’ gender was determined via IMDB biographies, pictures, and names.

English has two singular third-person pronouns most often used for people, “he” and “she.” In this analysis, for both the text data and the identification of gender for film writers, we have chosen to identify men and women with the pronouns “he” and “she.” Using this type of classification, any writer or character associated with the pronoun “she” is classified as a woman.