Code Layout
The import code consists primarily of Rust, wired together with DVC, with data in several directories to facilitate ease of discovery. We use Python and R in Quarto documents for analytics and reporting.
Rust
The Rust code all lives under src
, with the various command-line programs in src/cli
. The Rust tools are implemented as a monolithic executable with subcommands for various operations, to save disk space and compile time. To see the help:
cargo run help
The programs are run through cargo run
in --release
mode; the bd.cmd
jsonnet function automates this, so we only need to specify the subcommand and its options in our pipeline definitions.
For writing new commands, there is a lot of utility code under src
. Consult the Rust API documentation for further details.
The Rust code makes extensive use of the polars, arrow2, and parquet2 crates for data analysis and IO. arrow2_convert is used to automate converstion for Parquet serialization.