Report of Multi-Check (References) Leading Indicators

Published

April 23, 2025

Modified

April 23, 2025

Purpose

We reviewed the following set of leading indicators 2 weeks after starting the Multi-Check (References) A/B Test:

  1. Proportion of new content edits presented multiple reference checks within a single editing session
  2. Proportion of contributors that are presented Multi Check (References) and complete their edits
  3. Proportion of edits wherein people elect to dismiss/not change the text they’ve added.
  4. Proportion of people blocked after publishing an edit where Multi Check was shown a
  5. Proportion of published edits that add new content and are reverted within 48hours

Decision to be made: What – if any – adjustments/investigations will we prioritize for us to be confident moving forward with evaluating the Multi Check’s impact in T379131?

Please see the task description for additional details.

Note: Results are based on initial AB test data to check if any adjustments to the feature need to be prioritized. More event data will be needed to confirm statistical significance for many of these findings. We will review the complete AB test data (based on two week duration) as part of the analysis in T379131

Methodology

  • We collected two weeks of AB test events logged between 25 March 2025 and 08 April 2025. In this AB test, users in the test group can be shown multiple reference checks within a single editing session while users in the control group will only see one reference check for all edits that meet the requirements for it be shown.
  • For each leading indicator metric, we reviewed the following dimensions: by experiment group (test and control), by platform (mobile web or desktop), by user experience and status, and by partner wiki.
  • We also compared edits that were shown more than one reference check in a single session in the test group to edits that were only presented a single reference check. For edits presented more than one reference check, we reviewed a split by the number of checks shown to determine if there was a significant metric change at a certain number of checks presented.
  • We relied on events logged in VisualEditorFeatureUse and change tags recorded in the revision tags table. See instrumentation spec.
  • Data was limited to mobile and desktop edits completed on a main page namespace using VisualEditor on one of the partner Wikipedias. We also limited to edits completed by newcomers, junior contributors, and unregistered users as those are the users that would be shown reference check under the default config settings.

Summary of results

  1. Proportion of new content edits presented multiple reference checks within a single editing session
  • In the test group, multiple reference checks were shown within a single editing session at 19% of all published new content VE edits (549 edits) by unregistered users and users with 100 or fewer edits. For edits shown multiple checks, the majority of edits (73%) were shown between 2 to 5 checks. Based on this rate, we should have sufficient multi-check events after the test run for 4 weeks to confirm the overall statistical significance of any changes introduced by this change.
  1. Proportion of contributors that are presented Multi Check (References) and complete their edits
  • The edit completion rate for sessions that were shown multiple checks within a session was 76.1% compared to 75% for sessions shown only one check, indicating that multiple checks are not causing significant disruption or confusion to the editors.
  1. Proportion of edits wherein people elect to dismiss/not change the text they’ve added.
  • While we observed a slightly higher increase in the proportion of individual reference checks dismissed for edits shown multiple checks in the test group, sessions shown multiple checks are more likely to include at least one new reference in the final published edit compared to sessions shown just a single check. In the test group, 47.5% of all published edits shown multiple checks did not include at least one new reference compared to 60.3% of edits that were shown a single check.
  1. Proportion of people blocked after publishing an edit where Multi Check was shown
  • There were also no significant changes in the proportion of users blocked after being shown multiple checks compared to a single check.
  1. Proportion of published edits that add new content and are reverted within 48hours
  • We observed no significant differences in the revert rate of new content edits between the control and the test group for editing sessions where a reference check was shown. In the test group, the revert rate of new content edits shown multiple checks (17%) is currently lower compared to sessions shown a single check (26%.).
Code
# load packages
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
    library(lubridate)
    library(ggplot2)
    library(dplyr)
    library(gt)
    library(IRdisplay)
})
#set preferences
options(dplyr.summarise.inform = FALSE)
options(repr.plot.width = 15, repr.plot.height = 10)

Proportion of published new content edits presented multiple reference checks within a single editing session

Methodology: The number of reference checks shown within a single editing session is determined by the following event: event.feature = 'editCheck-addReference' AND event.action = 'check-shown-presave'.

We further limited the review to edits that were successfully published and identified as new content edits with the tag editcheck-newcontent.

Code
#load frequency data
edit_check_frequency_data <-
  read.csv(
    file = 'Queries/data/edit_check_frequency_data_li.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Code

# Set experience level group and factor levels
edit_check_frequency_data <- edit_check_frequency_data %>%
  mutate(
    experience_level_group = case_when(
     user_edit_count == 0 & user_status == 'registered' ~ 'Newcomer',
     user_edit_count == 0 & user_status == 'unregistered' ~ 'Unregistered',
      user_edit_count > 0 &  user_edit_count <= 100 ~ "Junior Contributor",
      user_edit_count >  100 ~ "Non-Junior Contributor"   
    ),
    experience_level_group = factor(experience_level_group,
         levels = c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor")
   ))  

#rename experiment field to clarify
edit_check_frequency_data <- edit_check_frequency_data %>%
  mutate(test_group = factor(test_group,
         levels = c("2025-03-editcheck-multicheck-reference-control", "2025-03-editcheck-multicheck-reference-test"),
         labels = c("control (single check)", "test (multiple checks)")))
Code
#Set fields and factor levels to assess number of checks shown
#Note limited to 1 sidebar open as we're looking for cases where multiple checks presented in a single sidebar (vs user going back and forth)

edit_check_frequency_data <- edit_check_frequency_data %>%
  mutate(
    multiple_checks_shown = 
         ifelse(n_checks_shown > 1 &  n_sidebar_opens < 2, 1, 0),  
     multiple_checks_shown = factor( multiple_checks_shown ,
         levels = c(0,1)))
         
# note these buckets can be adjusted as needed based on distribution of data
edit_check_frequency_data <- edit_check_frequency_data %>%
  mutate(
    checks_shown_bucket = case_when(
     is.na(n_checks_shown) ~ '0',
     n_checks_shown == 1 | (n_checks_shown > 1 & n_sidebar_opens >= 2)  ~ '1', 
     n_checks_shown == 2 & n_sidebar_opens < 2 ~ '2',
     n_checks_shown > 2 & n_checks_shown <= 5 & n_sidebar_opens < 2 ~ "3-5",
     n_checks_shown > 5 & n_checks_shown <= 10 & n_sidebar_opens < 2 ~ "6-10", 
     n_checks_shown > 10 & n_checks_shown <= 15 & n_sidebar_opens < 2 ~ "11-15", 
     n_checks_shown > 15 & n_checks_shown <= 20 & n_sidebar_opens < 2 ~ "16-20", 
    n_checks_shown > 20 & n_sidebar_opens < 2 ~ "over 20" 
    ),
    checks_shown_bucket = factor(checks_shown_bucket ,
         levels = c("0","1","2", "3-5", "6-10","11-15" ,"16-20", "over 20")
   ))  

Proportion of new content edits shown at least one reference check

Code
reference_checks_shown_bytest <- edit_check_frequency_data %>%
    filter(is_new_content == 1 & was_saved == 1) %>% #limit to new content edits
    group_by(test_group) %>%
    summarise(n_editing_session = n_distinct(editing_session),
              n_editing_session_refcheck = n_distinct(editing_session[was_edit_check_shown == 1])) %>%
    mutate(prop_check_shown = paste0(round(n_editing_session_refcheck/n_editing_session * 100, 1), "%"))  %>%
    gt()  %>%
    tab_header(
    title = "Published new content edits shown at least one reference check by experiment group"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment Group",
    n_editing_session = "Number of edits",
    n_editing_session_refcheck = "Number of edits shown reference checks",   
    prop_check_shown = "Proportion of edits shown reference check"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits')
    )


display_html(as_raw_html(reference_checks_shown_bytest))
Published new content edits shown at least one reference check by experiment group
Experiment Group Number of edits Number of edits shown reference checks Proportion of edits shown reference check
control (single check) 2932 2314 78.9%
test (multiple checks) 2926 2306 78.8%
Limited to published new content edits by unregistered users and users with 100 or fewer edits

Proportion of new content edits shown multiple checks in the test group

Code
multi_refchecks_overall <- edit_check_frequency_data %>%
    filter(is_new_content == 1 & was_saved == 1) %>%
    group_by(test_group) %>%
    summarise(n_editing_session = n_distinct(editing_session),
              n_editing_session_multicheck = n_distinct(editing_session[was_edit_check_shown == 1 & multiple_checks_shown == 1])) %>%
    mutate(prop_check_shown = paste0(round(n_editing_session_multicheck/n_editing_session * 100, 1), "%")) %>% 
    filter(test_group == 'test (multiple checks)')  %>%
    gt()  %>%
    tab_header(
    title = "Published new content edits shown multiple checks in the test group"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment Group",
    n_editing_session = "Number of edits",
    n_editing_session_multicheck = "Number of edits shown multiple reference checks",   
    prop_check_shown = "Proportion of edits shown multiple reference checks"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits')
    )


display_html(as_raw_html(multi_refchecks_overall))
Published new content edits shown multiple checks in the test group
Experiment Group Number of edits Number of edits shown multiple reference checks Proportion of edits shown multiple reference checks
test (multiple checks) 2926 549 18.8%
Limited to published new content edits by unregistered users and users with 100 or fewer edits

Proportion of new content edits by number of checks shown

Code
multi_refchecks_overall <- edit_check_frequency_data %>%
    filter(is_new_content == 1 , was_saved == 1, 
          test_group == 'test (multiple checks)') %>% #want to limit to test group where multiple can be shown
    mutate(total_sessions = n_distinct(editing_session)) %>%
    group_by(total_sessions, checks_shown_bucket) %>%
    summarise(n_editing_session_refcheck = n_distinct(editing_session)) %>%
    mutate(prop_check_shown = paste0(round(n_editing_session_refcheck/total_sessions * 100, 2), "%")) %>%
    ungroup() %>%
    select(-1) %>%
     mutate(n_editing_session_refcheck = ifelse(n_editing_session_refcheck < 50, "<50", n_editing_session_refcheck))  %>% #sanitizing per data publication guidelines
    gt()  %>%
    tab_header(
    title = "Published new content edits by total number of reference checks shown in the test group"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    checks_shown_bucket = "Number of reference checks",
    n_editing_session_refcheck = "Number of edits",   
    prop_check_shown = "Proportion of edits"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits')
    )


display_html(as_raw_html(multi_refchecks_overall))
Published new content edits by total number of reference checks shown in the test group
Number of reference checks Number of edits Proportion of edits
0 620 21.19%
1 1759 60.12%
2 214 7.31%
3-5 186 6.36%
6-10 94 3.21%
11-15 <50 0.99%
16-20 <50 0.34%
over 20 <50 0.58%
Limited to published new content edits by unregistered users and users with 100 or fewer edits

Proportion of new content edits shown multiple checks in the test group by platform

Code
multi_reference_checks_shown_byplatform <- edit_check_frequency_data %>% 
     filter(is_new_content == 1 , was_saved == 1,
           test_group == 'test (multiple checks)')  %>%  #limit to test group sessions with more than one ref check
    group_by(platform) %>%
    summarise(n_editing_session = n_distinct(editing_session),
              n_editing_session_multicheck = n_distinct(editing_session[was_edit_check_shown == 1 & multiple_checks_shown == 1 ] )) %>%
    mutate(prop_check_shown = paste0(round(n_editing_session_multicheck/n_editing_session * 100, 1), "%"))  %>%
    gt()  %>%
    tab_header(
    title = "Published new content edits shown multiple reference checks by platform"
      )  %>%
  opt_stylize(5) %>%
  cols_label(
    platform = "Platform",
    n_editing_session = "Number of edits",
    n_editing_session_multicheck = "Number of edits shown multiple reference checks",   
    prop_check_shown = "Proportion of edits shown multiple reference checks"
  ) %>%
    tab_source_note(
        gt::md('Limited to new content edits by unregistered users and users with 100 or fewer edits assigned to test group')
    )


display_html(as_raw_html(multi_reference_checks_shown_byplatform))
Published new content edits shown multiple reference checks by platform
Platform Number of edits Number of edits shown multiple reference checks Proportion of edits shown multiple reference checks
desktop 1922 441 22.9%
phone 1004 108 10.8%
Limited to new content edits by unregistered users and users with 100 or fewer edits assigned to test group

Proportion of new content edits shown multiple checks by user experience

Code
multi_reference_checks_shown_byuserstatus <- edit_check_frequency_data %>%
    filter(is_new_content == 1 ,
           was_saved == 1,test_group == 'test (multiple checks)')  %>%  #limit to test group sessions with more than one ref check
    group_by(experience_level_group) %>%
    summarise(n_editing_session = n_distinct(editing_session),
              n_editing_session_multicheck = n_distinct(editing_session[was_edit_check_shown == 1 & multiple_checks_shown == 1  ])) %>%
    mutate(prop_check_shown = paste0(round(n_editing_session_multicheck/n_editing_session * 100, 1), "%"))  %>%

    gt()  %>%
    tab_header(
    title = "Published new content edits shown multiple reference checks by user experience"
      ) %>%
  opt_stylize(5) %>%
  cols_label(
    experience_level_group = "User Status",
    n_editing_session = "Number of edits",
    n_editing_session_multicheck = "Number of edits shown multiple reference checks",   
    prop_check_shown = "Proportion of edits shown multiple reference checks"
  ) %>%
    tab_source_note(
        gt::md('Limited to new content edits by unregistered users and users with 100 or fewer edits assigned to test group')
    )



display_html(as_raw_html(multi_reference_checks_shown_byuserstatus))
Published new content edits shown multiple reference checks by user experience
User Status Number of edits Number of edits shown multiple reference checks Proportion of edits shown multiple reference checks
Unregistered 1496 231 15.4%
Newcomer 328 89 27.1%
Junior Contributor 1102 229 20.8%
Limited to new content edits by unregistered users and users with 100 or fewer edits assigned to test group

By partner Wikipedia

Code
multi_reference_checks_shown_bywiki <- edit_check_frequency_data %>%
    filter(is_new_content == 1, was_saved == 1,
           test_group == 'test (multiple checks)')  %>%  #limit to test group sessions with more than one ref check
    group_by(wiki) %>%
    summarise(n_editing_session = n_distinct(editing_session),
              n_editing_session_multicheck = n_distinct(editing_session[was_edit_check_shown == 1 & multiple_checks_shown == 1  ])) %>%
    mutate(prop_check_shown = paste0(round(n_editing_session_multicheck/n_editing_session * 100, 1), "%"))  %>%
    filter(n_editing_session_multicheck > 50)  %>%
    gt()  %>%
    tab_header(
    title = "Published new content edits shown multiple reference checks by partner wikipedia"
      )  %>%
 opt_stylize(5) %>%
  cols_label(
    wiki = "Wikipedia",
    n_editing_session = "Number of edits",
    n_editing_session_multicheck = "Number of edits shown multiple reference checks",   
    prop_check_shown = "Proportion of edits shown multiple reference checks"
  ) %>%
    tab_source_note(
        gt::md('Limited to wikis with at least 50 published new content edits shown multiple checks')
    )



display_html(as_raw_html(multi_reference_checks_shown_bywiki))
Published new content edits shown multiple reference checks by partner wikipedia
Wikipedia Number of edits Number of edits shown multiple reference checks Proportion of edits shown multiple reference checks
eswiki 636 117 18.4%
frwiki 857 169 19.7%
itwiki 637 97 15.2%
ptwiki 235 51 21.7%
Limited to wikis with at least 50 published new content edits shown multiple checks

Key Insights

  • Reference checks are presented at about 78% of published new content VisualEditor edits completed by unregistered users and users with 100 or fewer edits. Frequency is the same across experiment groups as the change introduced by this experiment did not impact the number of sessions that could be shown a reference check at least once.
  • In the test group, multiple reference checks were shown within a single editing session at 19% of all new content VE edits (549 edits) by unregistered users and users with 100 or fewer edits.
    • For edits shown multiple checks, the majority of edits (73%) were shown between 2 to 5 checks. 3% of edits were shown over 20 checks within a single session.
    • Multiple reference checks are shown more frequently at desktop compared to mobile web. 23% of new content edits on desktop were shown multiple reference checks compared to 11% of new content edits on mobile web.
  • Newcomers are also more likely to be shown multiple reference checks (27% of new content edits published by newcomers in the test group were shown multiple reference checks compared to 21% of edits by junior contributors and 15% by unregistered users).
  • At all partner wikis, the proportion of new content edits shown multiple reference checks ranges from 15% at Italian Wikipedia to 22% at Portuguese Wikipedia. All partner wikis have had several published edits where multiple checks were shown; however, some of the smaller wikis have had very few events (< 25) logged to date.

Proportion of contributors that are presented multi check (References) and complete their edits

Methodology We reviewed the proportion of edits by newcomers, junior contributors, and unregistered users that were shown reference check during their edit session and successfully published their edit (event.action = saveSuccess). The analysis is limited to only edits that reached the point where reference check was presented at least once after indicating their intent to save (event.action = saveIntent).

Code
# load data for assessing edit completion rate
edit_completion_rates_data <-
  read.csv(
    file = 'Queries/data/edit_completion_rate_data.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Code
# Set experience level group and factor levels
edit_completion_rates_data <- edit_completion_rates_data %>%
  mutate(
    experience_level_group = case_when(
     user_edit_count == 0 & user_status == 'registered' ~ 'Newcomer',
     user_edit_count == 0 & user_status == 'unregistered' ~ 'Unregistered',
      user_edit_count > 0 &  user_edit_count <= 100 ~ "Junior Contributor",
      user_edit_count >  100 ~ "Non-Junior Contributor"   
    ),
    experience_level_group = factor(experience_level_group,
         levels = c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor")
   ))  

#rename experiment field to clarfiy
edit_completion_rates_data <- edit_completion_rates_data %>%
  mutate(test_group = factor(test_group,
         levels = c("2025-03-editcheck-multicheck-reference-control", "2025-03-editcheck-multicheck-reference-test"),
         labels = c("control (single check)", "test (multiple checks)")))
Code
#Set fields and factor levels to assess number of checks shown
#Note limited to 1 sidebar open as we're looking for cases where multiple checks presented in a single sidebar (vs user going back and forth)

edit_completion_rates_data <- edit_completion_rates_data %>%
  mutate(
    multiple_checks_shown = 
         ifelse(n_checks_shown > 1 &  n_sidebar_opens < 2, "multiple checks shown", "one check shown"),  
     multiple_checks_shown = factor( multiple_checks_shown ,
         levels = c("one check shown", "multiple checks shown")))
         
# note these buckets can be adjusted as needed based on distribution of data
edit_completion_rates_data <- edit_completion_rates_data %>%
  mutate(
    checks_shown_bucket = case_when(
     is.na(n_checks_shown) ~ '0',
     n_checks_shown == 1 | (n_checks_shown > 1 & n_sidebar_opens >= 2)  ~ '1', 
     n_checks_shown == 2 & n_sidebar_opens < 2 ~ '2',
     n_checks_shown > 2 & n_checks_shown <= 5 & n_sidebar_opens < 2 ~ "3-5",
     n_checks_shown > 5 & n_checks_shown <= 10 & n_sidebar_opens < 2 ~ "6-10", 
     n_checks_shown > 10 & n_checks_shown <= 15 & n_sidebar_opens < 2 ~ "11-15", 
     n_checks_shown > 15 & n_checks_shown <= 20 & n_sidebar_opens < 2 ~ "16-20", 
    n_checks_shown > 20 & n_sidebar_opens < 2 ~ "over 20" 
    ),
    checks_shown_bucket = factor(checks_shown_bucket ,
         levels = c("0","1","2", "3-5", "6-10","11-15" ,"16-20", "over 20")
   ))
Code
#Remove one abnormal instance of multiple checks being shown within control group
edit_completion_rates_data <- edit_completion_rates_data %>%
filter(!(test_group == 'control (single check)' & multiple_checks_shown == "multiple checks shown"))

Edit completion rate by experiment group

Code
edit_completion_rate_overall <- edit_completion_rates_data %>%
    filter(ref_check_shown == 1) %>% #limit to sessions where referen check was shown
    group_by(test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0])) %>%
    mutate(completion_rate = paste0(round(n_saves/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "Edit completion rate by experiment group"
      )  %>%
opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment Group",
    n_edits = "Number of edit attempts shown reference check",
    n_saves = "Number of published edits",
    completion_rate = "Proportion of edits saved"
  ) %>%
    tab_source_note(
        gt::md('Limited to edit attempts shown at least one reference check')
    )



display_html(as_raw_html(edit_completion_rate_overall ))
Edit completion rate by experiment group
Experiment Group Number of edit attempts shown reference check Number of published edits Proportion of edits saved
control (single check) 3145 2342 74.5%
test (multiple checks) 3107 2338 75.2%
Limited to edit attempts shown at least one reference check

Edit completion rate by if multiple checks were shown

Code
edit_completion_rate_bymulti <- edit_completion_rates_data %>%
    filter(ref_check_shown == 1) %>%
    group_by(test_group, multiple_checks_shown) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0])) %>%
    mutate(completion_rate = paste0(round(n_saves/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "Edit completion rate by if multiple checks were shown"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment group",
    multiple_checks_shown = "Multiple checks shown",
    n_edits = "Number of edit attempts shown reference check",
    n_saves = "Number of published edits",
    completion_rate = "Proportion of edits saved"
  ) %>%
    tab_source_note(
        gt::md('Limited to edit attempts shown at least one reference check')
    )


display_html(as_raw_html(edit_completion_rate_bymulti ))
Edit completion rate by if multiple checks were shown
Multiple checks shown Number of edit attempts shown reference check Number of published edits Proportion of edits saved
control (single check)
one check shown 3145 2342 74.5%
test (multiple checks)
one check shown 2371 1778 75%
multiple checks shown 736 560 76.1%
Limited to edit attempts shown at least one reference check

Edit completion rate by number of checks shown

Code
edit_completion_rate_bynchecks <- edit_completion_rates_data %>%
    filter(ref_check_shown == 1) %>%
    group_by(test_group, checks_shown_bucket) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0])) %>%
    mutate(completion_rate = paste0(round(n_saves/n_edits * 100, 1), "%")) %>%  
    ungroup()%>%  
    mutate(n_edits = ifelse(n_edits < 50, "<50", n_edits),
           n_saves = ifelse(n_saves < 50, "<50", n_saves))  %>% #sanitizing per data publication guidelines
    group_by(test_group) %>%  
    gt()  %>%
    tab_header(
    title = "Edit completion rate by the number of reference checks shown"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    checks_shown_bucket = "Number of checks shown",
    n_edits = "Number of edit attempts shown reference check",
    n_saves = "Number of published edits",
    completion_rate = "Proportion of edits saved"
  ) %>%
    tab_source_note(
        gt::md('Limited to edits shown at least one reference check')
    )


display_html(as_raw_html(edit_completion_rate_bynchecks ))
Edit completion rate by the number of reference checks shown
Number of checks shown Number of edit attempts shown reference check Number of published edits Proportion of edits saved
control (single check)
1 3145 2342 74.5%
test (multiple checks)
1 2371 1778 75%
2 269 220 81.8%
3-5 250 189 75.6%
6-10 129 96 74.4%
11-15 <50 <50 74.4%
16-20 <50 <50 58.8%
over 20 <50 <50 50%
Limited to edits shown at least one reference check

Edit completion rate by platform

Code
edit_completion_rate_byplatform <- edit_completion_rates_data %>%
    filter(ref_check_shown == 1) %>%
    group_by(platform, test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0])) %>%
    mutate(completion_rate = paste0(round(n_saves/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "Edit completion rate by experiment group and platform"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment Group",
    platform = "Platform",
    n_edits = "Number of edit attempts shown reference check",
    n_saves = "Number of published edits",
    completion_rate = "Proportion of edits saved"
  ) %>%
    tab_source_note(
        gt::md('Limited to edit attempts shown at least one reference check')
    )


display_html(as_raw_html(edit_completion_rate_byplatform))
Edit completion rate by experiment group and platform
Experiment Group Number of edit attempts shown reference check Number of published edits Proportion of edits saved
desktop
control (single check) 1843 1427 77.4%
test (multiple checks) 1832 1444 78.8%
phone
control (single check) 1302 915 70.3%
test (multiple checks) 1275 894 70.1%
Limited to edit attempts shown at least one reference check

Edit completion rate by user experience

Code
edit_completion_rate_byuserstatus <- edit_completion_rates_data %>%
    filter(ref_check_shown == 1) %>%
    group_by(experience_level_group, test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0])) %>%
    mutate(completion_rate = paste0(round(n_saves/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "Edit completion rate by experiment group and editor experience"
      )  %>%
 opt_stylize(5) %>%
  cols_label(
    test_group = "Test Group",
    experience_level_group = "Experiment Group",
    n_edits = "Number of edit attempts shown reference check",
    n_saves = "Number of published edits",
    completion_rate = "Proportion of edits saved"
  ) %>%
    tab_source_note(
        gt::md('Limited to edit attempts shown at least one reference check')
    )


display_html(as_raw_html(edit_completion_rate_byuserstatus ))
Edit completion rate by experiment group and editor experience
Test Group Number of edit attempts shown reference check Number of published edits Proportion of edits saved
Unregistered
control (single check) 1783 1266 71%
test (multiple checks) 1811 1292 71.3%
Newcomer
control (single check) 433 317 73.2%
test (multiple checks) 397 289 72.8%
Junior Contributor
control (single check) 929 759 81.7%
test (multiple checks) 899 757 84.2%
Limited to edit attempts shown at least one reference check

Edit completion rate by partner Wikipedia

Code
edit_completion_rate_bywiki <- edit_completion_rates_data %>%
    filter(ref_check_shown == 1) %>%
    group_by(wiki, test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0])) %>%
    mutate(completion_rate = paste0(round(n_saves/n_edits * 100, 1), "%")) %>% 
    filter(n_saves >= 100) %>% 
    gt()  %>%
    tab_header(
    title = "Edit completion rate by experiment group and user status"
      )  %>%
 opt_stylize(5) %>%
  cols_label(
    test_group = "Test Group",
    wiki = "Wikipedia",
    n_edits = "Number of edit attempts shown edit check",
    n_saves = "Number of published edits",
    completion_rate = "Proportion of edits saved"
  ) %>%
    tab_source_note(
        gt::md('Limited to wikis with at least 100 published edits')
    )



display_html(as_raw_html(edit_completion_rate_bywiki))
Edit completion rate by experiment group and user status
Test Group Number of edit attempts shown edit check Number of published edits Proportion of edits saved
arwiki
control (single check) 258 146 56.6%
test (multiple checks) 207 112 54.1%
eswiki
control (single check) 643 468 72.8%
test (multiple checks) 712 517 72.6%
frwiki
control (single check) 826 671 81.2%
test (multiple checks) 865 687 79.4%
itwiki
control (single check) 727 549 75.5%
test (multiple checks) 684 538 78.7%
jawiki
control (single check) 277 210 75.8%
test (multiple checks) 240 186 77.5%
ptwiki
control (single check) 257 175 68.1%
test (multiple checks) 231 167 72.3%
Limited to wikis with at least 100 published edits

Key Insights

  • The overall edit completion rate for the test group (75.2%) is currently 1% higher than the edit completion rate for the control group (74.5%), indicating that multiple checks are not causing significant disruption or confusion to the editors.
    • We also directly compared editing sessions shown multiple reference checks to editing sessions shown only one reference check. The edit completion rate for sessions that were shown multiple checks within a session was 76.1% compared to 75% for sessions shown only one check.
    • Edit completion rates stay around 75% for up to 15 checks shown within a single session. After that, edit completion rate decreases to 58.8% for editing sessions shown between 16 to 26 checks and 50% for edits shown over 20. Note: There were fewer than 50 edit attempts overall that were shown over 16 reference checks so more data is needed to confirm the decrease at this threshold.
  • We also did not observe any significant differences in edit completion rate by platform, user experience level or wiki. More data will be needed to confirm any statistically significant changes in completion rates caused by multi-check.

Proportion of published new content edits wherein people elected to dismiss adding a new reference.

Methodology: We reviewed the propotion of published new content edits that people elected to dismiss adding a new reference. This was determined by edits where the user declined to add a reference at least once in a session (event.feature = 'editCheck-addReference'AND event.action = 'action-reject') and where no new reference was included in the final published new content edit (edits with revision tag:editcheck-newreference).

We also reviewed the proportion of all individual reference checks that were dismissed.

Code
# load data for assessing edit reject frequency
edit_check_reject_data <-
  read.csv(
    file = 'Queries/data/edit_check_rejects_data.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Code
# Set experience level group and factor levels
edit_check_reject_data <- edit_check_reject_data %>%
  mutate(
    experience_level_group = case_when(
     user_edit_count == 0 & user_status == 'registered' ~ 'Newcomer',
     user_edit_count == 0 & user_status == 'unregistered' ~ 'Unregistered',
      user_edit_count > 0 &  user_edit_count <= 100 ~ "Junior Contributor",
      user_edit_count >  100 ~ "Non-Junior Contributor"   
    ),
    experience_level_group = factor(experience_level_group,
         levels = c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor")
   ))  

#rename experiment field to clarify
edit_check_reject_data <- edit_check_reject_data %>%
  mutate(test_group = factor(test_group,
         levels = c("2025-03-editcheck-multicheck-reference-control", "2025-03-editcheck-multicheck-reference-test"),
         labels = c("control (single check)", "test (multiple checks)")))
Code
#Set fields and factor levels to assess number of checks shown
#Note limited to 1 sidebar open as we're looking for cases where multiple checks presented in a single sidebar (vs user going back and forth)

edit_check_reject_data <- edit_check_reject_data %>%
  mutate(
    multiple_checks_shown = 
         ifelse(n_checks_shown > 1 &  n_sidebar_opens < 2, "multiple checks shown", "single check shown"),  
     multiple_checks_shown = factor( multiple_checks_shown ,
         levels = c("single check shown", "multiple checks shown")))
         
# note these buckets can be adjusted as needed based on distribution of data
edit_check_reject_data <- edit_check_reject_data %>%
  mutate(
    checks_shown_bucket = case_when(
     is.na(n_checks_shown) ~ '0',
     n_checks_shown == 1 | (n_checks_shown > 1 & n_sidebar_opens >= 2)  ~ '1', 
     n_checks_shown == 2 & n_sidebar_opens < 2 ~ '2',
     n_checks_shown > 2 & n_checks_shown <= 5 & n_sidebar_opens < 2 ~ "3-5",
     n_checks_shown > 5 & n_checks_shown <= 10 & n_sidebar_opens < 2 ~ "6-10", 
     n_checks_shown > 10 & n_checks_shown <= 15 & n_sidebar_opens < 2 ~ "11-15", 
     n_checks_shown > 15 & n_checks_shown <= 20 & n_sidebar_opens < 2 ~ "16-20", 
    n_checks_shown > 20 & n_sidebar_opens < 2 ~ "over 20" 
    ),
    checks_shown_bucket = factor(checks_shown_bucket ,
         levels = c("0","1","2", "3-5", "6-10","11-15" ,"16-20", "over 20")
   ))   
Code
#remove some small occurrences of abnormal data. Will investigate but <0.001% of data at moment so won't impact results.
#Remove one abnormal instance of multiple checks being shown within control group
edit_check_reject_data  <- edit_check_reject_data  %>%
filter(!(test_group == 'control (single check)' & multiple_checks_shown == "multiple checks shown"))

# remove one abnormal instance of multiple reject actions being logged with no instances of checks being shown
# Relable n_rejects option
edit_check_reject_data  <- edit_check_reject_data  %>%
filter(!(is.na(n_checks_shown) & n_rejects > 0)) %>%
mutate(n_rejects = ifelse(n_checks_shown > 0 & is.na(n_rejects), 0, n_rejects))

Proportion of new content edits without a reference by experiment group

Code
edit_check_dismissal_overall <- edit_check_reject_data %>%
    filter(was_edit_check_shown == 1 & is_new_content == 1) %>% #limit to where shown
    group_by(test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_rejects = n_distinct(editing_session[n_rejects > 0 & included_new_reference == 0])) %>% #limit to new content edits without a refernece
    mutate(dismissal_rate = paste0(round(n_rejects/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "Proportion of new content edits where reference checks was shown \n and no new reference was added"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment Group",
    n_edits = "Number of edits shown reference check",
    n_rejects = "Number of edits that did not add at least one new reference",
    dismissal_rate = "Proportion of edits where people elected to not add a reference"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits where at least one reference check was shown')
    )


display_html(as_raw_html(edit_check_dismissal_overall ))
Proportion of new content edits where reference checks was shown and no new reference was added
Experiment Group Number of edits shown reference check Number of edits that did not add at least one new reference Proportion of edits where people elected to not add a reference
control (single check) 2313 1333 57.6%
test (multiple checks) 2307 1320 57.2%
Limited to published new content edits where at least one reference check was shown

Proportion of new content edits without a reference by if multiple checks were shown

Code
edit_check_dismissal_bymultiple <- edit_check_reject_data %>%
    filter(was_edit_check_shown == 1 & is_new_content == 1) %>% #limit to where shown
    group_by(test_group,multiple_checks_shown) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_rejects = n_distinct(editing_session[n_rejects > 0 & included_new_reference == 0])) %>% #limit to new content edits without a refernece
    mutate(dismissal_rate = paste0(round(n_rejects/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "Proportion of new content edits without a reference by if multiple checks were shown"
      )  %>%
 opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment Group",
     multiple_checks_shown = "Multiple Checks",
      n_edits = "Number of edits shown reference check",
    n_rejects = "Number of edits that did not add at least one new reference",
    dismissal_rate = "Proportion of edits where people elected to not add a reference"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits')
    )


display_html(as_raw_html(edit_check_dismissal_bymultiple ))
Proportion of new content edits without a reference by if multiple checks were shown
Multiple Checks Number of edits shown reference check Number of edits that did not add at least one new reference Proportion of edits where people elected to not add a reference
control (single check)
single check shown 2313 1333 57.6%
test (multiple checks)
single check shown 1760 1061 60.3%
multiple checks shown 547 259 47.3%
Limited to published new content edits

Proportion of new content edits without a reference by number of checks shown

Code
edit_check_dismissal_bynchecks <- edit_check_reject_data %>%
    filter(was_edit_check_shown == 1 & is_new_content == 1 & n_sidebar_opens < 2 ) %>% #limit to where shown
    group_by(test_group, checks_shown_bucket) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_rejects = n_distinct(editing_session[n_rejects > 0 & included_new_reference == 0])) %>% #limit to new content edits without a refernece
    mutate(dismissal_rate = paste0(round(n_rejects/n_edits * 100, 1), "%")) %>% 
    ungroup() %>%
      mutate(n_edits = ifelse(n_edits < 50, "<50", n_edits),
           n_rejects = ifelse(n_rejects < 50, "<50", n_rejects))  %>% #sanitizing per data publication guidelines
    group_by(test_group)   %>%
    gt()  %>%
    tab_header(
    title = "Proportion of new content edits without a reference by the number of checks shown"
      )  %>%
 opt_stylize(5) %>%
  cols_label(
    checks_shown_bucket = "Number of reference checks shown",
    n_edits = "Number of edits shown reference check",
    n_rejects = "Number of edits that did not add at least one new reference",
    dismissal_rate = "Proportion of edits where people elected to not add a reference"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits')
    )


display_html(as_raw_html(edit_check_dismissal_bynchecks))
Proportion of new content edits without a reference by the number of checks shown
Number of reference checks shown Number of edits shown reference check Number of edits that did not add at least one new reference Proportion of edits where people elected to not add a reference
control (single check)
1 2032 1226 60.3%
test (multiple checks)
1 1512 941 62.2%
2 213 107 50.2%
3-5 186 84 45.2%
6-10 94 <50 44.7%
11-15 <50 <50 46.4%
16-20 <50 <50 60%
over 20 <50 <50 43.8%
Limited to published new content edits

Overall reference check dismissal rate by experiment group

We also reviewed the total number of individual reference checks dismissed to determine if a large of portion of checks within a single sessions were being actively dismissed by users.

Code
edit_check_dismissal_totals <- edit_check_reject_data %>%
    filter(was_edit_check_shown == 1 & is_new_content == 1 ) %>% #limit to where shown
    group_by(test_group, multiple_checks_shown) %>%
    summarise(n_checks_shown = sum(n_checks_shown), #Note there are NAs for sessions that don't select. Need to replace with 0
              n_rejects = sum(n_rejects )) %>% #limit to new content edits without a refernece
    mutate(dismissal_rate = paste0(round(n_rejects/n_checks_shown * 100, 1), "%")) %>%   
    gt()  %>%
    opt_stylize(5) %>%
    tab_header(
    title = "Proportion of distinct reference checks shown that were dismissed"
      )  %>%
  cols_label(
    #multiple_checks_shown = "Multiple checks shown",
    n_checks_shown = "Number of checks shown",
    n_rejects = "Number of reference checks dismissed",
    dismissal_rate = "Proportion of reference checks dismissed"
  ) 


display_html(as_raw_html(edit_check_dismissal_totals))
Proportion of distinct reference checks shown that were dismissed
multiple_checks_shown Number of checks shown Number of reference checks dismissed Proportion of reference checks dismissed
control (single check)
single check shown 2823 1713 60.7%
test (multiple checks)
single check shown 4433 1804 40.7%
multiple checks shown 2903 2025 69.8%

Proportion of new content edits without a reference by platform

Code
edit_check_dismissal_byplatform <- edit_check_reject_data %>%
    filter(was_edit_check_shown == 1 & is_new_content == 1) %>% #limit to where shown
    group_by(platform,test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_rejects = n_distinct(editing_session[n_rejects > 0 & included_new_reference == 0])) %>% #limit to new content edits without a refernece
    mutate(dismissal_rate = paste0(round(n_rejects/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "Proportion of new content edits without a reference by platform"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment Group",
    platform = "Platform",
    n_edits = "Number of edits shown reference check",
    n_rejects = "Number of edits that did not add at least one new reference",
    dismissal_rate = "Proportion of edits where people elected to not add a reference"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits')
    )

display_html(as_raw_html(edit_check_dismissal_byplatform ))
Proportion of new content edits without a reference by platform
Experiment Group Number of edits shown reference check Number of edits that did not add at least one new reference Proportion of edits where people elected to not add a reference
desktop
control (single check) 1413 701 49.6%
test (multiple checks) 1419 705 49.7%
phone
control (single check) 900 632 70.2%
test (multiple checks) 888 615 69.3%
Limited to published new content edits

Proportion of new content edits without a reference by user experience

Code
edit_check_dismissal_byuserstatus <- edit_check_reject_data %>%
    filter(was_edit_check_shown == 1 & is_new_content == 1) %>% #limit to where shown
    group_by(experience_level_group, test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_rejects = n_distinct(editing_session[n_rejects > 0 & included_new_reference == 0])) %>% #limit to new content edits without a refernece
    mutate(dismissal_rate = paste0(round(n_rejects/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "Proportion of new content edits without a reference by user experience"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment Group",
    experience_level_group = "User Status",
   n_edits = "Number of edits shown reference check",
    n_rejects = "Number of edits that did not add at least one new reference",
    dismissal_rate = "Proportion of edits where people elected to not add a reference"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits')
    )


display_html(as_raw_html(edit_check_dismissal_byuserstatus))
Proportion of new content edits without a reference by user experience
Experiment Group Number of edits shown reference check Number of edits that did not add at least one new reference Proportion of edits where people elected to not add a reference
Unregistered
control (single check) 1255 832 66.3%
test (multiple checks) 1282 861 67.2%
Newcomer
control (single check) 309 156 50.5%
test (multiple checks) 283 143 50.5%
Junior Contributor
control (single check) 749 345 46.1%
test (multiple checks) 742 316 42.6%
Limited to published new content edits

Proportion of new content edits without a reference by partner Wikipedia

Code
edit_check_dismissal_bywiki <- edit_check_reject_data %>%
    filter(was_edit_check_shown == 1 & is_new_content == 1) %>% #limit to where shown
    group_by(wiki, test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_rejects = n_distinct(editing_session[n_rejects > 0 & included_new_reference == 0])) %>% #limit to new content edits without a refernece
    mutate(dismissal_rate = paste0(round(n_rejects/n_edits * 100, 1), "%")) %>% 
    filter(n_rejects > 65) %>%  #remove wikis with too few edits
    gt()  %>%
    tab_header(
    title = "Proportion of new content edits without a reference by Wikipedia"
      )  %>%
 opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment Group",
    wiki = "Wikipedia",
    n_edits = "Number of edits shown reference check",
    n_rejects = "Number of edits that did not add at least one new reference",
    dismissal_rate = "Proportion of edits where people elected to not add a reference"
  ) %>%
    tab_source_note(
        gt::md('Limited to wikis with at least 100 published edits')
    )


display_html(as_raw_html(edit_check_dismissal_bywiki))
Proportion of new content edits without a reference by Wikipedia
Experiment Group Number of edits shown reference check Number of edits that did not add at least one new reference Proportion of edits where people elected to not add a reference
eswiki
control (single check) 461 307 66.6%
test (multiple checks) 514 310 60.3%
frwiki
control (single check) 661 358 54.2%
test (multiple checks) 676 392 58%
itwiki
control (single check) 546 338 61.9%
test (multiple checks) 535 340 63.6%
jawiki
control (single check) 209 144 68.9%
test (multiple checks) 184 100 54.3%
ptwiki
control (single check) 172 70 40.7%
test (multiple checks) 164 72 43.9%
Limited to wikis with at least 100 published edits

Key Insights

  • Comparing overall rates observed in the test and control groups, there are no significant differences in the proportion of published new content edits where people elected not to add a new reference. 57.6% of new content edits where reference check was shown did not include a new reference in the test group compared to 57.2% of new content edits in the control group.
    • While we observed a slightly higher increase in the proportion of individual checks dismissed for edits shown multiple checks in the test group, sessions shown multiple checks are more likely to include at least one new reference in the final published edit compared to sessions shown just a single check. In the test group, 47.5% of all published edits shown multiple checks did not include at least one new reference compared to 60.3% of edits that were shown a single check.
  • Currently, the proportion of edits without a new reference appears to decrease slightly with increasing number of checks shown; however, more edits where multiple checks are presented are needed to confirm.
  • These trends do not vary significantly by platform, user experience level, and wiki.

Proportion of published new content edits that are reverted within 48hours

Methdology: Reviewed the proportion of all new content edits where reference check was shown and were reverted within 48 hours.

Code
# load data for assessing edit reject frequency
edit_check_revert_data <-
  read.csv(
    file = 'Queries/data/edit_check_reverts_data.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Code
# Set experience level group and factor levels
edit_check_revert_data <- edit_check_revert_data %>%
  mutate(
    experience_level_group = case_when(
     user_edit_count == 0 & user_status == 'registered' ~ 'Newcomer',
     user_edit_count == 0 & user_status == 'unregistered' ~ 'Unregistered',
      user_edit_count > 0 &  user_edit_count <= 100 ~ "Junior Contributor",
      user_edit_count >  100 ~ "Non-Junior Contributor"   
    ),
    experience_level_group = factor(experience_level_group,
         levels = c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor")
   ))  

#rename experiment field to clarify
edit_check_revert_data <- edit_check_revert_data %>%
  mutate(test_group = factor(test_group,
         levels = c("2025-03-editcheck-multicheck-reference-control", "2025-03-editcheck-multicheck-reference-test"),
         labels = c("control (single check)", "test (multiple checks)")))
Code
# set field to indicate if more than one check was shown in a single session. Note: This should only be applicable to the test group 

edit_check_revert_data <- edit_check_revert_data %>%
  mutate(
    multiple_checks_shown = 
         ifelse(n_checks_shown > 1 &  n_sidebar_opens < 2, "multiple checks shown", "single check shown"),  
     multiple_checks_shown = factor( multiple_checks_shown ,
         levels = c("single check shown", "multiple checks shown")))
         
# note these buckets can be adjusted as needed based on distribution of data
edit_check_revert_data <- edit_check_revert_data %>%
  mutate(
    checks_shown_bucket = case_when(
     is.na(n_checks_shown) ~ '0',
     n_checks_shown == 1 | (n_checks_shown > 1 & n_sidebar_opens >= 2)  ~ '1', 
     n_checks_shown == 2 & n_sidebar_opens < 2 ~ '2',
     n_checks_shown > 2 & n_checks_shown <= 5 & n_sidebar_opens < 2 ~ "3-5",
     n_checks_shown > 5 & n_checks_shown <= 10 & n_sidebar_opens < 2 ~ "6-10", 
     n_checks_shown > 10 & n_checks_shown <= 15 & n_sidebar_opens < 2 ~ "11-15", 
     n_checks_shown > 15 & n_checks_shown <= 20 & n_sidebar_opens < 2 ~ "16-20", 
    n_checks_shown > 20 & n_sidebar_opens < 2 ~ "over 20" 
    ),
    checks_shown_bucket = factor(checks_shown_bucket ,
         levels = c("0","1","2", "3-5", "6-10","11-15" ,"16-20", "over 20")
   ))  
Code
#Remove one abnormal instance of multiple checks being shown within control group
edit_check_revert_data <- edit_check_revert_data %>%
filter(!(test_group == 'control (single check)' & multiple_checks_shown == "multiple checks shown"))

Revert rate by experiment group

Code
edit_check_revert_overall <- edit_check_revert_data %>%
    filter(is_new_content == 1,
          was_edit_check_shown == 1) %>% #limit to where shown
    group_by(test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_reverts = n_distinct(editing_session[was_reverted == 1])) %>% #limit to new content edits without a refernece
    mutate(revert_rate = paste0(round(n_reverts/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "New content edit revert rate by experiment group"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    test_group = "Test Group",
    n_edits = "Number of published edits shown reference check",
    n_reverts = "Number of edits reverted",
    revert_rate = "Proportion of new content edits that were reverted"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits shown at least one reference check')
    )



display_html(as_raw_html(edit_check_revert_overall ))
New content edit revert rate by experiment group
Test Group Number of published edits shown reference check Number of edits reverted Proportion of new content edits that were reverted
control (single check) 2313 562 24.3%
test (multiple checks) 2307 604 26.2%
Limited to published new content edits shown at least one reference check

Revert rate by if mulitiple checks were shown

Code
edit_check_revert_bymultiple <- edit_check_revert_data %>%
    filter(is_new_content == 1 & was_edit_check_shown == 1) %>% #limit to where shown
    group_by( multiple_checks_shown) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_reverts = n_distinct(editing_session[was_reverted == 1])) %>% #limit to new content edits without a refernece
    mutate(revert_rate = paste0(round(n_reverts/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "New content edit revert rate by if multiple checks were shown"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    multiple_checks_shown = "Multiple Check",
    n_edits = "Number of published new content edits",
    n_reverts = "Number of edits reverted ",
    revert_rate = "Proportion of new content edits that were reverted"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits shown at least one reference check')
    )


display_html(as_raw_html(edit_check_revert_bymultiple ))
New content edit revert rate by if multiple checks were shown
Multiple Check Number of published new content edits Number of edits reverted Proportion of new content edits that were reverted
single check shown 4073 1073 26.3%
multiple checks shown 547 93 17%
Limited to published new content edits shown at least one reference check

Revert rate by the number of checks shown

Code
edit_check_revert_bynchecks <- edit_check_revert_data %>%
    filter(is_new_content == 1 & was_edit_check_shown == 1) %>% #limit to where shown
    group_by(test_group, checks_shown_bucket) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_reverts = n_distinct(editing_session[was_reverted == 1])) %>% #limit to new content edits without a refernece
    mutate(revert_rate = paste0(round(n_reverts/n_edits * 100, 1), "%")) %>% 
    select(-c(3,4)) %>% # removing number columns since data is too granular
    group_by(test_group)%>% 
    gt()  %>%
    tab_header(
    title = "New content edit revert rate by the number of checks shown"
      )  %>%
    opt_stylize(5) %>%
  cols_label(
    test_group = "Test Group",
    checks_shown_bucket = "Number of Checks Shown",
    revert_rate = "Proportion of new content edits that were reverted"
  ) 


display_html(as_raw_html(edit_check_revert_bynchecks))
New content edit revert rate by the number of checks shown
Number of Checks Shown Proportion of new content edits that were reverted
control (single check)
1 24.3%
test (multiple checks)
1 29%
2 20.2%
3-5 16.7%
6-10 9.6%
11-15 14.3%
16-20 30%
over 20 18.8%

New content edit revert rate by platform

Code
edit_check_revert_byplatform <- edit_check_revert_data %>%
    filter(is_new_content == 1 & was_edit_check_shown == 1) %>% #limit to where shown
    group_by( platform, test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_reverts = n_distinct(editing_session[was_reverted == 1])) %>% #limit to new content edits without a refernece
    mutate(revert_rate = paste0(round(n_reverts/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "New content edit revert rate by platform"
      )  %>%
  opt_stylize(5) %>%
  cols_label(
    test_group = "Test Group",
    platform = "Platform",
    n_edits = "Number of published new content edits",
    n_reverts = "Number of edits reverted",
    revert_rate = "Proportion of new content edits that were reverted"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits shown at least one reference check')
    )



display_html(as_raw_html(edit_check_revert_byplatform ))
New content edit revert rate by platform
Test Group Number of published new content edits Number of edits reverted Proportion of new content edits that were reverted
desktop
control (single check) 1413 232 16.4%
test (multiple checks) 1419 294 20.7%
phone
control (single check) 900 330 36.7%
test (multiple checks) 888 310 34.9%
Limited to published new content edits shown at least one reference check

New content edit revert rate by user experience

Code
edit_check_revert_byuserexp <- edit_check_revert_data %>%
    filter(is_new_content == 1 & was_edit_check_shown == 1) %>% #limit to where shown
    group_by(experience_level_group,test_group ) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_reverts = n_distinct(editing_session[was_reverted == 1])) %>% #limit to new content edits without a refernece
    mutate(revert_rate = paste0(round(n_reverts/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "New content edit revert rate by user experience"
      )  %>%
   opt_stylize(5) %>%
  cols_label(
    test_group = "Experiement Group",
    experience_level_group  = "User Status",
     n_edits = "Number of published new content edits",
    n_reverts = "Number of edits reverted",
    revert_rate = "Proportion of new content edits that were reverted"
  ) %>%
    tab_source_note(
        gt::md('Limited to published new content edits shown at least one reference check')
    )



display_html(as_raw_html(edit_check_revert_byuserexp))
New content edit revert rate by user experience
Experiement Group Number of published new content edits Number of edits reverted Proportion of new content edits that were reverted
Unregistered
control (single check) 1255 357 28.4%
test (multiple checks) 1282 404 31.5%
Newcomer
control (single check) 309 78 25.2%
test (multiple checks) 283 67 23.7%
Junior Contributor
control (single check) 749 127 17%
test (multiple checks) 742 133 17.9%
Limited to published new content edits shown at least one reference check

New content edit revert rate by partner Wikipedia

Code
edit_check_revert_bywiki <- edit_check_revert_data %>%
    filter(is_new_content == 1 & was_edit_check_shown == 1) %>% #limit to where shown
    group_by( wiki, test_group) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_reverts = n_distinct(editing_session[was_reverted == 1])) %>% #limit to new content edits without a refernece
    mutate(revert_rate = paste0(round(n_reverts/n_edits * 100, 1), "%")) %>%  
    filter(n_reverts > 50) %>% 
    gt()  %>%
    tab_header(
    title = "New content edit revert rate by partner Wikipedia"
      )  %>%
  opt_stylize(5) %>%
  cols_label(
    test_group = "Experiment Group",
    wiki  = "Wikipedia",
       n_edits = "Number of published new content edits",
    n_reverts = "Number of edits reverted",
    revert_rate = "Proportion of new content edits that were reverted"
  )  %>%
    tab_source_note(
        gt::md('Limited to wikis with at least 100 published edits')
    )


display_html(as_raw_html(edit_check_revert_bywiki))
New content edit revert rate by partner Wikipedia
Experiment Group Number of published new content edits Number of edits reverted Proportion of new content edits that were reverted
eswiki
control (single check) 461 188 40.8%
test (multiple checks) 514 197 38.3%
frwiki
control (single check) 661 155 23.4%
test (multiple checks) 676 166 24.6%
itwiki
control (single check) 546 134 24.5%
test (multiple checks) 535 155 29%
Limited to wikis with at least 100 published edits

Key Insights

  • There are no significant difference in the revert rate of new content edits between the control and the test group for editing sessions where a reference check was shown. The revert rate of new content edits in the control group was 24.3% and the revert rate of new content edits in the test group was 26.2%.
    • In the test group, the revert rate of new content edits shown multiple checks is 17% compared to 26% for sessions shown a single check.
  • There were no significant increases in revert rate based on the number of checks shown. Current trends indicate that edits shown between 3 to 5 reference checks are less likely to be reverted compared to edits shown 2 reference checks. However, the number of edits shown over 2 checks is still limited, and we need more multi-check editing sessions to confirm this trend.
  • No significant differences in revert rate for content edits published on mobile web. There is currently a slight increase in the new content edit revert rate on desktop for the test group; however, this increase was just observed for edits shown a single check not multiple checks. We will confirm impacts on revert rate in the full AB test analysis.
  • In the test group, we observed a decrease in new content edit revert rate for both junior contributors and newcomers. There was a slight increase in revert rate for unregistered contributors (28% revert rate in control to 31.5% in the test group).

Proportion of people blocked after publishing an edit where Multi Check was shown

Methodology: We gathered all edits where edit check was shown from the mediawiki_revision_change_tag table and joined with mediawiki_private_cu_changes to gather user name info. We then reviewed both global and local blocks made within 6 hours of the edit check event as identified in the logging table.

Note: We do not yet have block data for April dates so analysis is limited to blocks that occured between 25 March 2025 through 31 March 2025.

Code
# load data for assessing blocks
edit_check_blocks <-
  read.csv(
    file = 'Queries/data/edit_check_eligible_users_blocked.csv',
    header = TRUE,
    sep = ",",
    stringsAsFactors = FALSE
  ) 
Code
#rename experiment field to clarify
edit_check_blocks <- edit_check_blocks%>%
  mutate(test_group = factor(bucket,
         levels = c("2025-03-editcheck-multicheck-reference-control", "2025-03-editcheck-multicheck-reference-test"),
         labels = c("control (single check)", "test (multiple checks)")))

Block rates by experiment group

Code
edit_check_local_blocks_overall <- edit_check_blocks %>%
    group_by(test_group) %>%
    summarise(blocked_users = n_distinct(cuc_ip[is_local_blocked == 'True' | is_global_blocked == 'True']),
              all_users = n_distinct(cuc_ip))  %>%  #look at blocks
    mutate(prop_blocks = paste0(round(blocked_users/all_users * 100, 1), "%")) %>%
    select(-c(2,3)) %>% #removing granular data columns 
    gt()  %>%
    tab_header(
    title = "Proportion of users blocked by experiment group"
      )  %>%
  opt_stylize(5) %>%
  cols_label(
    test_group = "Test Group",
    prop_blocks = "Proportion of users blocked"
  )  %>%
    tab_source_note(
        gt::md('Limited to users blocked 6 hours after publishing an edit where reference check was shown')
    )


display_html(as_raw_html(edit_check_local_blocks_overall))
Proportion of users blocked by experiment group
Test Group Proportion of users blocked
control (single check) 3%
test (multiple checks) 4.1%
Limited to users blocked 6 hours after publishing an edit where reference check was shown

Key Insights

  • 3.3% of users were blocked after publishing an edit where at least one reference check was shown. By experiment group, 4.1% of users were blocked in the test group compared to 3% in the control group. This difference is not statistically significant and limited to edits by unregistered users in each group.
  • No global blocks were issued to any users that published an edit where at least one reference check was shown.