5 Results - EMPATH scores

Empath is a tool for generating high-quality lexical categories (Fast, Chen, and Bernstein 2016). Transforming the bag-of-words into bags of lexical categories reduces the number of dimensions and improves the interpretability of the results. To examine the interaction between author gender and character portrayal, I plotted lexical category frequency for male and female characters by author gender.

Ensemble classification on Empath categories predicts character gender with an F1 score of 78.5% (table 5.1). The difference in gender score is larger for male authors compared to female authors (fig. 5.1).

Table 5.1: Predicted character gender using EMPATH categories.
Predicted Male Predicted Female
Male Characters 3988 1012
Female Characters 1138 3862
Character gender score using EMPATH categories.

Figure 5.1: Character gender score using EMPATH categories.