Analytics Datasets: Caching
This dataset is a restricted public snapshot of the
wmf.webrequest table intended for caching research. You can read about the data and the reasoning behind it on our on-Wiki documentation.
The data is updated manually and irregularly upon request. The previous variant of this data set was released in 2016 upon request.
The current iteration of this data set includes a total of 42 compressed files, 21 of which hold upload (image) web request data and the other 21 of which hold text pageview web request data.
Each upload data file, denoted
cache-u, contains exactly 24 hours of consecutive data. These files are each roughly 1.5GB in size and hold roughly 4GB of decompressed data each.
Each text data file, denoted
cache-t, contains exactly 24 hours of consecutive data. These files are each roughly 100MB in size and hold roughly 300MB of decompressed data each.
The compressed file names look like:
The decompressed file names look like: