############## Read parameters for debugging ################
if (length(rmarkdown::metadata) == 0) {
params <- rmarkdown::yaml_front_matter("report.Rmd")$params
}
#############################################################
knitr::opts_chunk$set(
error = TRUE, message = params$debug, warning = params$debug
)
set.seed(0)
suppressPackageStartupMessages({
library(magrittr)
library(ggplot2)
import::from(
# We don't import certain verbs (e.g. distinct, left_join, bind_rows)
# to avoid potential name-conflicts and because they're one-time use.
dplyr,
# Subsetting Verbs
keep_where = filter, select,
# Grouping Verbs
group_by, ungroup,
# Manipulation Verbs
mutate, arrange, summarize, tally,
# Utility Functions
case_when, if_else
)
})
source("functions.R")
This A/B test will have a test group that will be shown additional information that will be relevant to individual results on the existing search results page. This additional information will be links and metadata of related pages with links to related categories and suggested language links that are similar to the individual search results. A control group will see the currently existing search results page.
This test ran from 13 July 2017 to 22 July 2017 on enwiki. There were 2 test groups: explore_similar_control, explore_similar_test. This report includes fulltext searches. Refer to Phabricator ticket T149809 for more details.
Query:
SELECT
timestamp,
event_uniqueId AS event_id,
event_mwSessionId,
event_pageViewId AS page_id,
event_searchSessionId AS session_id,
event_subTest AS `group`,
wiki,
MD5(LOWER(TRIM(event_query))) AS query_hash,
event_action AS event,
CASE
WHEN event_position < 0 THEN NULL
ELSE event_position
END AS event_position,
CASE
WHEN event_action = 'searchResultPage' AND event_hitsReturned > 0 THEN 'TRUE'
WHEN event_action = 'searchResultPage' AND event_hitsReturned IS NULL THEN 'FALSE'
ELSE NULL
END AS `some same-wiki results`,
CASE
WHEN event_action = 'searchResultPage' AND event_hitsReturned > -1 THEN event_hitsReturned
WHEN event_action = 'searchResultPage' AND event_hitsReturned IS NULL THEN 0
ELSE NULL
END AS n_results,
event_scroll,
event_checkin,
event_extraParams,
event_msToDisplayResults AS load_time,
event_searchToken AS search_token,
userAgent AS user_agent
FROM TestSearchSatisfaction2_16909631
WHERE LEFT(timestamp, 8) >= '20170713' AND LEFT(timestamp, 8) < '20170723'
AND wiki IN('enwiki')
AND event_subTest IN('explore_similar_control', 'explore_similar_test')
AND event_action IN('searchResultPage', 'click', 'ssclick', 'visitPage', 'checkin', 'hover-on', 'hover-off', 'esclick')
AND event_source IN('fulltext')
AND event_subTest IS NOT NULL
AND CASE WHEN event_action = 'searchResultPage' THEN event_msToDisplayResults IS NOT NULL
WHEN event_action IN ('click', 'iwclick', 'ssclick') THEN event_position IS NOT NULL AND event_position > -1
WHEN event_action = 'visitPage' THEN event_pageViewId IS NOT NULL
WHEN event_action = 'checkin' THEN event_checkin IS NOT NULL AND event_pageViewId IS NOT NULL
ELSE TRUE
END;
Deleted 251 duplicated events. Removed 82 orphan (SERP-less) events. Removed 0 sessions falling into multiple test groups.
Select one of these three tabs:
Days | Events | Sessions | Page IDs | SERPs | Unique search queries | Searches | Same-wiki clicks | Other clicks |
---|---|---|---|---|---|---|---|---|
10 | 16,166 | 2,102 | 7,804 | 6,482 | 5,617 | 4,578 | 1,369 | 35 |
Select one of these sub-tabs:
Action identifies the context in which the event was created. Every time a new search is performed a searchEngineResultPage event is created. When the user clicks a link in the results a visitPage event is created. When the user has dwelled for N seconds a checkin event occurs. If the user clicks an interwiki result provided by TextCat language detection, there is a iwclick event. If the user clicks on a sister search result from the sidebar, that’s an ssclick. If the user interacts with a result to explore similar (pages, categories, translations), there are hover-on, hover-off, and esclick events.
Test group | wiki | Search sessions | Searches recorded |
---|---|---|---|
explore_similar_control | enwiki | 1,049 | 2,612 |
explore_similar_test | enwiki | 1,053 | 1,966 |
Total | All wikis | 2,102 | 4,578 |
The goal here is to see whether the proportions of operating system (OS) and browser usage are similar between the groups. If one group has very different OS/browser share breakdown, there might be something wrong with the implementation that caused or is causing the sampling to bias in favor of some OSes/browsers. Note that for brevity, we show only the top 10 OSes/browsers, and that we don’t actually expect the numbers to be different so this is included purely as a diagnostic.
wiki | os | explore_similar_control | explore_similar_test |
---|---|---|---|
enwiki | Linux | 1.2% (13) | 1.4% (15) |
enwiki | Mac OS X 10.10 | 2.0% (21) | 1.8% (19) |
enwiki | Mac OS X 10.11 | 4.1% (43) | 2.8% (30) |
enwiki | Mac OS X 10.12 | 7.0% (73) | 7.1% (75) |
enwiki | Other OSes | 3.9% (41) | 5.5% (58) |
enwiki | Ubuntu | 0.8% (8) | 0.8% (8) |
enwiki | Windows 10 | 35.7% (375) | 33.9% (357) |
enwiki | Windows 7 | 35.1% (368) | 37.4% (394) |
enwiki | Windows 8.1 | 6.9% (72) | 6.1% (64) |
enwiki | Windows Vista | 1.0% (10) | 1.0% (11) |
enwiki | Windows XP | 2.4% (25) | 2.1% (22) |
wiki | browser | explore_similar_control | explore_similar_test |
---|---|---|---|
enwiki | Chrome 45 | 0.8% (8) | 1.8% (19) |
enwiki | Chrome 49 | 1.3% (14) | 1.6% (17) |
enwiki | Chrome 58 | 1.2% (13) | 1.5% (16) |
enwiki | Chrome 59 | 41.3% (433) | 42.2% (444) |
enwiki | Edge 14 | 3.5% (37) | 3.0% (32) |
enwiki | Edge 15 | 3.6% (38) | 3.7% (39) |
enwiki | Firefox 52 | 1.6% (17) | 1.4% (15) |
enwiki | Firefox 54 | 12.1% (127) | 11.6% (122) |
enwiki | IE 11 | 13.7% (144) | 13.3% (140) |
enwiki | Other browsers | 15.6% (164) | 14.9% (157) |
enwiki | Safari 10 | 5.1% (54) | 4.9% (52) |
Select one of these sub-tabs:
Select one of these sub-tabs:
1st result | 3rd result | Sum | |
---|---|---|---|
related | 3 | 2 | 5 |
Sum | 3 | 2 | 5 |
0 results | 1 result | 2 results | 3 results | 4 results | 5+ results | Sum | |
---|---|---|---|---|---|---|---|
categories | 0 | 11 | 9 | 22 | 14 | 54 | 110 |
languages | 37 | 19 | 8 | 3 | 3 | 7 | 77 |
related | 4 | 0 | 0 | 107 | 0 | 0 | 111 |
Sum | 41 | 30 | 17 | 132 | 17 | 61 | 298 |
explore_similar_test vs. explore_similar_control
PaulScore is a measure of search results’ relevancy which takes into account the position of the clicked results, and is computed via the following steps:
We can calculate the confidence interval of PaulScore\((F)\) by approximating its distribution via boostrapping.