Book Author Gender
We compute the author gender for book clusters using the integrated data set.
Warning
See the paper for important limitations and ethical considerations.
Import Steps
cluster-genders
(inbook-links/
)-
Match book genders with clusters. Produces
cluster-genders.parquest
.
Gender Integration
For each book cluster, the integration does the following:
- Accumulate all names for the first author from OpenLibrary
- Accumulate all names for the first/primary author from the Library of Congress
- Obtain gender identities from all VIAF records matching an author name in this pool
- Consolidate gender into a cluster author gender identity
The results of this are stored in book-links/cluster-genders.parquet
.
book-links/cluster-genders.parquet
The author gender identified for each book cluster.
File details
Field
|
Type
|
---|---|
cluster
|
Int32
|
gender
|
LargeUtf8
|
book-links/gender-stats.csv
This file records the number of books with each gender resolution in each data set, for auditing and analysis purposes.
Limitations
See the paper for a fuller discussion. Some known limitations include:
- VIAF does not record non-binary gender identities.
- Recent versions of the OpenLibrary data contain VIAF identifiers for book authors, but we do not yet make use of this information. When available, they should improve the reliability of book-author linking.
- GoodReads includes author names, but we do not yet use these for linking to gender records.