BookCrossing
The BookCrossing data set consists of user-provided ratings — both implicit and explicit — of books.
The BookCrossing site is no longer online, so this data cannot be obtained from its original source and the BookCrossing integration is disabled by default. If you have a copy of this data, save the BX-CSV-Dump.zip
file in the data
directory and enable BookCrossing in config.yaml
to use it.
If you use the BookCrossing data, cite:
Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. 2005. Improving Recommendation Lists Through Topic Diversification. Proceedings of the 14th International World Wide Web Conference (WWW ’05), May 10-14, 2005, Chiba, Japan. DOI:10.1145/1060745.1060754.
Imported data lives in the bx
directory.
Import Steps
The import is controlled by the following DVC steps:
data/BX-CSV-Dump.zip.dvc
- Download the BookCrossing zip file.
clean-ratings
- Unpack ratings from the downloaded zip file and clean up their invalid characters.
cluster-ratings
-
Combine BookCrossing ratings with book clusters to produce (user, cluster, rating) from the explicit-feedback ratings. BookCrossing implicit feedback entries (rating of 0) are excluded. Produces
bx/bx-cluster-ratings.parquet
. cluster-actions
-
Combine BookCrossing interactions with book clusters to produce (user, cluster) implicit-feedback records. These records include the BookCrossing implicit feedback entries (rating of 0). Produces
bx/bx-cluster-actions.parquet
.
Raw Data
The raw rating data, with invalid characters cleaned up, is in the bx/cleaned-ratings.csv
file. It has the following columns:
user_id
- The user identifier (numeric).
isbn
- The book ISBN (text).
rating
- The book rating \(r_{ui}\). The ratings are on a 1-10 scale, with 0 indicating an implicit-feedback record.
Extracted Actions
bx/bx-cluster-ratings.parquet
The explicit-feedback ratings (\(r_{ui} > 0\) from {{ERR unknown file bx/cleaned-ratings.csv
}}), with book clusters as the item
s.
File details
Field
|
Type
|
---|---|
user
|
Int64
|
item
|
Int32
|
rating
|
Float64
|
nratings
|
UInt32
|
bx/bx-cluster-actions.parquet
All user-item interactions from {{ERR unknown file bx/cleaned-ratings.csv
}}, with book clusters as the item
s.
File details
Field
|
Type
|
---|---|
user
|
Int64
|
item
|
Int32
|
nactions
|
UInt32
|