Virtual Internet Authority File¶
Imported data lives in the
The import is controlled by the following DVC steps:
viaf-schema.sqlto set up the base schema.
Import raw VIAF MARC data from
viaf-index.sqlto index the MARC data and extract tables.
Extracted Author Tables¶
We extract the following tables for VIAF authors:
The author’s name(s). We insert an author name for each field with tag 700 and subfield code ‘a’. For all author names of the form ‘Family, Given’, we insert an additional record with the form ‘Given Family’ and indicator ‘S’. This helps maximize links.
The author’s gender, from field 375 subfield ‘a’. This is a raw extract of all gender identity assertions in the record; we resolve multiple assertions later in the data integration process.
VIAF Gender Vocabulary¶
The MARC gender field is defined as the author’s gender identity. It allows identities from an open vocabulary, along with start and end dates for the validity of each identity.
The Program for Cooperative Cataloging Task Group on Gender in Name Authority Records produced a report with recommendations for how to record this field. Many libraries contributing to the Library of Congress file, from which many VIAF records are sourced, follow these recommendations, but it is not safe to assume they are universally followed by all VIAF contributors.
Further, as near as we can tell, the VIAF removes all non-binary gender identities or converts them to ‘unknown’.
This data should only be used with great care. We discuss these limitations in the extended preprint.