BookCrossing

The BookCrossing data set consists of user-provided ratings — both implicit and explicit — of books.

Note

The BookCrossing site is no longer online, so this data cannot be obtained from its original source and the BookCrossing integration is disabled by default. If you have a copy of this data, save the BX-CSV-Dump.zip file in the data directory and enable BookCrossing in config.yaml to use it.

Important

If you use the BookCrossing data, cite:

Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. 2005. Improving Recommendation Lists Through Topic Diversification. Proceedings of the 14th International World Wide Web Conference (WWW ’05), May 10-14, 2005, Chiba, Japan. DOI:10.1145/1060745.1060754.

Imported data lives in the bx directory.

Import Steps

The import is controlled by the following DVC steps:

data/BX-CSV-Dump.zip.dvc: Download the BookCrossing zip file.
clean-ratings: Unpack ratings from the downloaded zip file and clean up their invalid characters.
cluster-ratings: Combine BookCrossing ratings with book clusters to produce (user, cluster, rating) from the explicit-feedback ratings. BookCrossing implicit feedback entries (rating of 0) are excluded. Produces bx/bx-cluster-ratings.parquet.
cluster-actions: Combine BookCrossing interactions with book clusters to produce (user, cluster) implicit-feedback records. These records include the BookCrossing implicit feedback entries (rating of 0). Produces bx/bx-cluster-actions.parquet.

Raw Data

The raw rating data, with invalid characters cleaned up, is in the bx/cleaned-ratings.csv file. It has the following columns:

user_id: The user identifier (numeric).
isbn: The book ISBN (text).
rating: The book rating \(r_{ui}\). The ratings are on a 1-10 scale, with 0 indicating an implicit-feedback record.

Extracted Actions

bx/bx-cluster-ratings.parquet

The explicit-feedback ratings (\(r_{ui} > 0\) from {{ERR unknown file bx/cleaned-ratings.csv}}), with book clusters as the items.

File details

Schema for `bx/bx-cluster-ratings.parquet`.
Field	Type
user	Int64
item	Int32
rating	Float64
nratings	UInt32

bx/bx-cluster-actions.parquet

All user-item interactions from {{ERR unknown file bx/cleaned-ratings.csv}}, with book clusters as the items.

File details

Schema for `bx/bx-cluster-actions.parquet`.
Field	Type
user	Int64
item	Int32
nactions	UInt32