Amazon Ratings#
This processes two data sets from Julian McAuley’s group at UCSD:
Each consists of user-provided reviews and ratings for a variety of products.
Currently we import the ratings-only data from the Books segment of the 2014 and 2018 data sets. Future versions of the data tools will support reviews.
If you use this data, cite the paper(s) documented on the data set web site. For 2014 data:
R. He and J. McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proc. WWW 2016. DOI:10.1145/2872427.2883037.
J. McAuley, C. Targett, J. Shi, and A. van den Hengel. Image-based recommendations on styles and substitutes. In Proc. SIGIR 2016. DOI:10.1145/2766462.2767755.
For 2018 data:
J. Ni, J. Li, and J. McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. In Empirical Methods in Natural Language Processing (EMNLP), 2019.
Imported data lives in the az2014
and az2018
directories. The source files
are not automatically downloaded — you will need to download the
ratings-only data for the Books category from each data site and save them
in the data/az2014
and data/az2018
directories.
Import Steps#
The import is controlled by the following DVC steps:
scan-ratings
Scan the rating CSV file into a Parquet file, converting user strings into numeric IDs. Produces
az2014/ratings.parquet
.cluster-ratings
Link ratings with book clusters and aggregate by cluster, to produce user ratings for book clsuters. Produces
az2014/az-cluster-ratings.parquet
.
Raw Data#
- az2014/ratings.parquet#
The raw rating data, with user strings converted to numeric IDs, is in this file.
Field
Type
user
Int32
asin
Utf8
rating
Float32
timestamp
Int64
- az2018/ratings.parquet#
The raw rating data, with user strings converted to numeric IDs, is in this file.
Field
Type
user
Int32
asin
Utf8
rating
Float32
timestamp
Int64
Extracted Rating Tables#
- az2014/az-cluster-ratings.parquet#
This file contains the integrated Amazon ratings, with cluster IDs in the
item
column.Field
Type
user
Int32?
item
Int32?
rating
Float32?
last_rating
Float32?
first_time
Int64?
last_time
Int64?
nratings
UInt32?
- az2018/az-cluster-ratings.parquet#
This file contains the integrated Amazon ratings, with cluster IDs in the
item
column.Field
Type
user
Int32?
item
Int32?
rating
Float32?
last_rating
Float32?
first_time
Int64?
last_time
Int64?
nratings
UInt32?