The DVC control files automatically download the appropriate version. The version can be
updated by modifying the
Imported data lives in the
The import is controlled by the following DVC steps:
ol-schema.sqlto set up the base schema.
Import raw OpenLibrary works from
Import raw OpenLibrary editions from
Import raw OpenLibrary authors from
ol-index.sqlto index the book data and extract tables.
ol-book-info.sqlto extract additional book data into tables.
OpenLibrary provides its data as JSON. It is imported as-is into a JSONB column in three tables:
Each of these has the following columns:
A numeric record identifier generated at import.
The OpenLibrary identifier key (e.g.
The raw JSON data containing the record.
We use PostgreSQL’s JSON operators and functions to extract the data from these tables for the rest of the OpenLibrary data model.
Extracted Edition Tables¶
We extract the following tables from OpenLibrary editions:
authorto record an edition’s authors.
authorto record an edition’s first author.
The raw ISBNs for each
edition(not ISBN IDs)
Link ISBNs, editions, and works, along with the book code derived from an edition’s work and edition IDs. If an edition belongs to multiple works, it will appear multiple times here. This table violates 4NF.
Extracted Work Tables¶
We extract the following tables from OpenLibrary works:
authorto record an work’s authors.
authorto record an work’s first author.
subjectsentries for each work.