Common Identifiers

There are two key identifiers that are used across data sets.

ISBNs

We use ISBNs for a lot of data linking. In order to speed up ISBN-based operations, we map textual ISBNs to numeric ’ISBN IDs`.

book-links/all-isbns.parquet

This file manages ISBN IDs and their mappings, along with statistics about their usage in other records.

Column Purpose
isbn_id ISBN identifier
isbn Textual ISBNs

Each type of ISBN (ISBN-10, ISBN-13) is considered a distinct ISBN. We also consider other ISBN-like things, particularly ASINs, to be ISBNs.

Additional fields in this table contain the number of records from different sources that reference this ISBN.

File details
Schema for book-links/all-isbns.parquet.
Field
Type
isbn_id
Int32
isbn
LargeUtf8
LOC
UInt32
OL
UInt32
GR
Int64
AZ14
UInt32
AZ18
UInt32

Many other tables that work with ISBNs use ISBN IDs.

Book Codes

We also use book codes, common identifiers for integrated ‘books’ across data sets. These are derived from identifiers in the various data sets. Each book code source is assigned to a different 100M number band (a ‘numspace’) so we can, if needed, derive the source from a book code.

Source Namespace Object Numspace
OL Work NS_WORK 100M
OL Edition NS_EDITION 200M
LOC Record NS_LOC_REC 300M
GR Work NS_GR_WORK 400M
GR Book NS_GR_BOOK 500M
LOC Work NS_LOC_WORK 600M
LOC Instance NS_LOC_INSTANCE 700M
ISBN NS_ISBN 900M

The bookdata::ids::codes module contains the Rust API for working with these codes (including each of the namespace objects) and converting identifiers into and out of them.

The LOC Work and Instance sources are not currently used; they are intended for future use when we are able to import BIBFRAME data from the Library of Congress.