3 Results - Genre

3.1 Author gender by genre

I classified book genre using tags listed by Goodreads.com. To examine how genres cluster together, I created a nearest neighbor graph from the cosine similarity between tag frequencies for the top 150 tags in the dataset (fig 3.1). Overlaying this graph with author gender show rather strong divisions between author genders for certain genres (fig 3.2). Since Goodreads tags are crowdsourced and created by readers, and readership and book reviews show gender bias (Thelwall 2019), the apparent gender-genre gap might not be reflected in the text itself. The list of tags and their frequency across author genders are in (table 7.1).

Figure 3.1: Nearest neighbor graph generated by book tag similarity, colored by genre. Hover over points to view the data!

Nearest neighbor graph generated by book tag similarity, colored by author gender.

Figure 3.2: Nearest neighbor graph generated by book tag similarity, colored by author gender.

3.2 Character gender by genre

Graphs get messy, so I made a table showing the number of books written by male and female authors in each genre (fig 3.3). In addition, the character gender ratio varies substantially across both author gender and genre (fig 3.4).

Author gender discrepancy by genre.

Figure 3.3: Author gender discrepancy by genre.

Character gender discrepancy by author gender.

Figure 3.4: Character gender discrepancy by author gender.

3.3 Classifying author gender using genre tags

Genre tags alone can classify author gender with 83% accuracy (table 3.1). Again, this might reflect gender bias in genre attrbituion, not just textual differences between authors (Thelwall 2019). The list of tags can be found in (table 7.1).

Table 3.1: Predicted author gender using tags
Predicted Male Predicted Female
Male Authors 1822 319
Female Authors 430 1938

3.4 Character frequency

Female minor characters are relatively underrepresented, as shown by a slightly reduced proportion of characters that make fewer appearances in the text (fig 3.6). This might reflect the concept of “male-as-default”, where non-gender-specific characters are generally portrayed as male.

Additionally, male characters are more frequently written as the subject (nsubj) of a sentence than a object (dobj), relative to female characters 3.5). Both of these differences were larger in male authors.

Ratio of characters as subject vs object.

Figure 3.5: Ratio of characters as subject vs object.

Distribution of characters as percentage of total words.

Figure 3.6: Distribution of characters as percentage of total words.

Distribution of characters as percentage of total words.

Figure 3.7: Distribution of characters as percentage of total words.