5 Results - EMPATH scores
Empath is a tool for generating high-quality lexical categories (Fast, Chen, and Bernstein 2016). Transforming the bag-of-words into bags of lexical categories reduces the number of dimensions and improves the interpretability of the results. To examine the interaction between author gender and character portrayal, I plotted lexical category frequency for male and female characters by author gender.
Ensemble classification on Empath categories predicts character gender with an F1 score of 78.5% (table 5.1). The difference in gender score is larger for male authors compared to female authors (fig. 5.1).
Predicted Male | Predicted Female | |
---|---|---|
Male Characters | 3988 | 1012 |
Female Characters | 1138 | 3862 |

Figure 5.1: Character gender score using EMPATH categories.