BookCrossing

The BookCrossing data set consists of user-provided ratings — both implicit and explicit — of books.

Note

The BookCrossing site is no longer online, so this data cannot be obtained from its original source and the BookCrossing integration is disabled by default. If you have a copy of this data, save the BX-CSV-Dump.zip file in the data directory and enable BookCrossing in config.yaml to use it.

Important

If you use the BookCrossing data, cite:

Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. 2005. Improving Recommendation Lists Through Topic Diversification. Proceedings of the 14th International World Wide Web Conference (WWW ’05), May 10-14, 2005, Chiba, Japan. DOI:10.1145/1060745.1060754.

Imported data lives in the bx directory.

Import Steps

The import is controlled by the following DVC steps:

data/BX-CSV-Dump.zip.dvc
Download the BookCrossing zip file.
clean-ratings
Unpack ratings from the downloaded zip file and clean up their invalid characters.
cluster-ratings
Combine BookCrossing ratings with book clusters to produce (user, cluster, rating) from the explicit-feedback ratings. BookCrossing implicit feedback entries (rating of 0) are excluded. Produces bx/bx-cluster-ratings.parquet.
cluster-actions
Combine BookCrossing interactions with book clusters to produce (user, cluster) implicit-feedback records. These records include the BookCrossing implicit feedback entries (rating of 0). Produces bx/bx-cluster-actions.parquet.

Raw Data

The raw rating data, with invalid characters cleaned up, is in the bx/cleaned-ratings.csv file. It has the following columns:

user_id
The user identifier (numeric).
isbn
The book ISBN (text).
rating
The book rating \(r_{ui}\). The ratings are on a 1-10 scale, with 0 indicating an implicit-feedback record.

Extracted Actions

bx/bx-cluster-ratings.parquet

The explicit-feedback ratings (\(r_{ui} > 0\) from {{ERR unknown file bx/cleaned-ratings.csv}}), with book clusters as the items.

File details
Schema for bx/bx-cluster-ratings.parquet.
Field
Type
user
Int64
item
Int32
rating
Float64
nratings
UInt32
bx/bx-cluster-actions.parquet

All user-item interactions from {{ERR unknown file bx/cleaned-ratings.csv}}, with book clusters as the items.

File details
Schema for bx/bx-cluster-actions.parquet.
Field
Type
user
Int64
item
Int32
nactions
UInt32