This A/B test will have a test group that will be shown additional information that will be relevant to individual results on the existing search results page. This additional information will be links and metadata of related pages with links to related categories and suggested language links that are similar to the individual search results. A control group will see the currently existing search results page.

This test ran from 13 July 2017 to 24 July 2017 on enwiki. There were 2 test groups: explore_similar_control, explore_similar_test. This report includes fulltext searches. Refer to Phabricator ticket T149809 for more details.

Query:

 SELECT
  timestamp,
  event_uniqueId AS event_id,
  event_mwSessionId,
  event_pageViewId AS page_id,
  event_searchSessionId AS session_id,
  event_subTest AS `group`,
  wiki,
  MD5(LOWER(TRIM(event_query))) AS query_hash,
  event_action AS event,
  CASE
    WHEN event_position < 0 THEN NULL
    ELSE event_position
    END AS event_position,
  CASE
    WHEN event_action = 'searchResultPage' AND event_hitsReturned > 0 THEN 'TRUE'
    WHEN event_action = 'searchResultPage' AND event_hitsReturned IS NULL THEN 'FALSE'
    ELSE NULL
    END AS `some same-wiki results`,
  CASE
    WHEN event_action = 'searchResultPage' AND event_hitsReturned > -1 THEN event_hitsReturned
    WHEN event_action = 'searchResultPage' AND event_hitsReturned IS NULL THEN 0
    ELSE NULL
    END AS n_results,
  event_scroll,
  event_checkin,
  event_extraParams,
  event_msToDisplayResults AS load_time,
  event_searchToken AS search_token,
  userAgent AS user_agent
FROM TestSearchSatisfaction2_16909631
WHERE LEFT(timestamp, 8) >= '20170713' AND LEFT(timestamp, 8) < '20170725' 
  AND wiki IN('enwiki') 
  AND event_subTest IN('explore_similar_control', 'explore_similar_test') 
  AND event_source IN('fulltext') 
  AND event_searchSessionId <> 'explore_similar_test' 
  AND CASE WHEN event_action = 'searchResultPage' THEN event_msToDisplayResults IS NOT NULL
            WHEN event_action IN ('click', 'iwclick', 'ssclick') THEN event_position IS NOT NULL AND event_position > -1
            WHEN event_action = 'visitPage' THEN event_pageViewId IS NOT NULL
            WHEN event_action = 'checkin' THEN event_checkin IS NOT NULL AND event_pageViewId IS NOT NULL
            ELSE TRUE
       END; 

Data Clean-up

Deleted 301 duplicated events. Removed 93 orphan (SERP-less) events. Removed 0 sessions falling into multiple test groups.

Data Summary

Select one of these three tabs:

Test Summary

Days Events Sessions Page IDs SERPs Unique search queries Searches Same-wiki clicks Other clicks
12 19,007 2,500 9,052 7,474 6,597 5,332 1,638 71

Select one of these sub-tabs:

Events

Action identifies the context in which the event was created. Every time a new search is performed a searchEngineResultPage event is created. When the user clicks a link in the results a visitPage event is created. When the user has dwelled for N seconds a checkin event occurs. If the user clicks an interwiki result provided by TextCat language detection, there is a iwclick event. If the user clicks on a sister search result from the sidebar, that’s an ssclick. If the user interacts with a result to explore similar (pages, categories, translations), there are hover-on, hover-off, and esclick events.

Searches

Test group wiki Search sessions Searches recorded
explore_similar_control enwiki 1,239 3,000
explore_similar_test enwiki 1,261 2,332
Total All wikis 2,500 5,332

Searches with n same-wiki results returned

SERPs by offset

Browser & OS

The goal here is to see whether the proportions of operating system (OS) and browser usage are similar between the groups. If one group has very different OS/browser share breakdown, there might be something wrong with the implementation that caused or is causing the sampling to bias in favor of some OSes/browsers. Note that for brevity, we show only the top 10 OSes/browsers, and that we don’t actually expect the numbers to be different so this is included purely as a diagnostic.

wiki os explore_similar_control explore_similar_test
enwiki Linux 1.3% (16) 1.4% (18)
enwiki Mac OS X 10.10 1.9% (23) 1.6% (20)
enwiki Mac OS X 10.11 4.2% (52) 2.5% (32)
enwiki Mac OS X 10.12 7.3% (91) 6.9% (87)
enwiki Other OSes 4.0% (50) 5.6% (71)
enwiki Ubuntu 0.8% (10) 0.6% (8)
enwiki Windows 10 35.8% (443) 34.6% (436)
enwiki Windows 7 34.4% (426) 36.6% (462)
enwiki Windows 8.1 7.1% (88) 6.5% (82)
enwiki Windows Vista 0.9% (11) 1.3% (17)
enwiki Windows XP 2.3% (29) 2.2% (28)
wiki browser explore_similar_control explore_similar_test
enwiki Chrome 45 0.6% (8) 1.6% (20)
enwiki Chrome 49 1.5% (18) 1.7% (22)
enwiki Chrome 58 1.2% (15) 1.5% (19)
enwiki Chrome 59 41.6% (516) 41.6% (525)
enwiki Edge 14 3.6% (45) 2.9% (37)
enwiki Edge 15 3.6% (44) 3.8% (48)
enwiki Firefox 52 1.5% (18) 1.4% (18)
enwiki Firefox 54 11.9% (148) 11.7% (148)
enwiki IE 11 13.4% (166) 13.3% (168)
enwiki Other browsers 16.1% (199) 15.4% (194)
enwiki Safari 10 5.0% (62) 4.9% (62)

Explore Similar

Select one of these sub-tabs:

Clicks by section and position

3rd result Sum
related 2 2
Sum 2 2

Hover-overs by section and results

0 results 1 result 2 results 3 results 4 results 5+ results Sum
categories 2 13 11 27 17 60 130
languages 49 20 9 6 3 10 97
related 4 0 0 137 0 0 141
Sum 55 33 20 170 20 70 368

Results of Statistical Analysis

Same-wiki Zero Results Rate

Same-wiki Engagement

explore_similar_test vs. explore_similar_control

First Clicked Same-Wiki Result’s Position

Maximum Clicked Position for Same-Wiki Results

PaulScore

PaulScore is a measure of search results’ relevancy which takes into account the position of the clicked results, and is computed via the following steps:

  1. Pick scoring factor \(0 < F < 1\) (larger values of \(F\) increase the weight of clicks on lower-ranked results).
  2. For \(i\)-th search session \(S_i\) \((i = 1, \ldots, n)\) containing \(m\) queries \(Q_1, \ldots, Q_m\) and search result sets \(\mathbf{R}_1, \ldots, \mathbf{R}_m\):
  1. For each \(j\)-th search query \(Q_j\) with result set \(\mathbf{R}_j\), let \(\nu_j\) be the query score: \[\nu_j = \sum_{k~\in~\{\text{0-based positions of clicked results in}~\mathbf{R}_j\}} F^k.\]
  2. Let user’s average query score \(\bar{\nu}_{(i)}\) be \[\bar{\nu}_{(i)} = \frac{1}{m} \sum_{j = 1}^m \nu_j.\]
  1. Then the PaulScore is the average of all users’ average query scores: \[\text{PaulScore}(F)~=~\frac{1}{n} \sum_{i = 1}^n \bar{\nu}_{(i)}.\]

We can calculate the confidence interval of PaulScore\((F)\) by approximating its distribution via boostrapping.

Other Pages of the Search Results

Dwell Time Per Visited Page

Scroll