This report used data from the second bm25 A/B test and tried to partially reproduced the analysis.

This test ran from 27 October 2016 to 15 November 2016 on zhwiki, jawiki, thwiki. There were 2 test groups: bm25:control, bm25:inclinks_pv. This report includes fulltext searches. Refer to Phabricator ticket T147495 for more details.

Data Clean-up

Deleted 0 duplicated events. Removed 2878 orphan (SERP-less) events. Removed 0 sessions falling into multiple test groups.

Data Summary

Select one of these three tabs:

Test Summary

Days Events Sessions Page IDs SERPs Unique search queries Searches Same-wiki clicks Other clicks
20 230,478 36,121 103,433 81,281 81,199 64,486 22,993 0

Select one of these sub-tabs:

Events

Action identifies the context in which the event was created. Every time a new search is performed a searchEngineResultPage event is created. When the user clicks a link in the results a visitPage event is created. When the user has dwelled for N seconds a checkin event occurs. If the user clicks an interwiki result provided by TextCat language detection, there is a iwclick event. If the user clicks on a sister search result from the sidebar, that’s an ssclick. If the user interacts with a result to explore similar (pages, categories, translations), there are hover-on, hover-off, and esclick events.

Searches

Test group wiki Search sessions Searches recorded
bm25:control jawiki 7,579 13,815
bm25:control thwiki 3,889 7,001
bm25:control zhwiki 6,510 11,446
bm25:inclinks_pv jawiki 7,610 13,971
bm25:inclinks_pv thwiki 4,055 7,083
bm25:inclinks_pv zhwiki 6,478 11,170
Total All wikis 36,121 64,486

Searches with n same-wiki results returned

Explore Similar

Select one of these sub-tabs:

Results of Statistical Analysis

Same-wiki Zero Results Rate

Same-wiki Engagement

bm25:inclinks_pv vs. bm25:control

bm25:inclinks_pv vs. bm25:control by wiki

First Clicked Same-Wiki Result’s Position

Maximum Clicked Position for Same-Wiki Results

PaulScore

PaulScore is a measure of search results’ relevancy which takes into account the position of the clicked results, and is computed via the following steps:

  1. Pick scoring factor \(0 < F < 1\) (larger values of \(F\) increase the weight of clicks on lower-ranked results).
  2. For \(i\)-th search session \(S_i\) \((i = 1, \ldots, n)\) containing \(m\) queries \(Q_1, \ldots, Q_m\) and search result sets \(\mathbf{R}_1, \ldots, \mathbf{R}_m\):
  1. For each \(j\)-th search query \(Q_j\) with result set \(\mathbf{R}_j\), let \(\nu_j\) be the query score: \[\nu_j = \sum_{k~\in~\{\text{0-based positions of clicked results in}~\mathbf{R}_j\}} F^k.\]
  2. Let user’s average query score \(\bar{\nu}_{(i)}\) be \[\bar{\nu}_{(i)} = \frac{1}{m} \sum_{j = 1}^m \nu_j.\]
  1. Then the PaulScore is the average of all users’ average query scores: \[\text{PaulScore}(F)~=~\frac{1}{n} \sum_{i = 1}^n \bar{\nu}_{(i)}.\]

We can calculate the confidence interval of PaulScore\((F)\) by approximating its distribution via boostrapping.

Dwell Time Per Visited Page

Scroll