Code Layout

The import code consists of Python and Rust, wired together with DVC, with data in several directories to facilitate ease of discovery.

Python Scripts

Python scripts live in the various directories in which they operate. They should not be launched directly, but rather via, which will make sure the environment is set up properly for them:


The bookdata package contains a little Python utility code. The script ensures it is available in the Python import path.


The Rust code all lives under src, with the various command-line programs in src/cli. The Rust tools are implemented as a monolithic executable with subcommands for various operations, to save disk space and compile time. To see the help:

cargo run help

Or through Python:

python --rust help

The script with the --rust option sets up some environment variables to ensure that the Rust code builds correctly inside a Conda environment, and also defaults to using a release build (cargo run uses debug builds by default). All DVC pipeline stages use to run the Rust tools.

For writing new commands, there is a lot of utility code under src. Consult the Rust for further details.

The Rust code makes extensive use of the polars, arrow2, and parquet2 crates for data analysis and IO. arrow2_convert is used to automate converstion for Parquet serialization.