Pre and Post Deployment Analysis of Multi-Check Phase 1: Presenting single edit check in side rail

Author

Megan Neisler, Staff Data Scientist, Wikimedia Foundation

Modified

2025-02-24

Purpose

In T378777, the Editing team changed the desktop Reference Check experience by presenting the check in a side rail located adjacent to the editable content, rather than presenting the Check within the editable content.

This change was deployed on 12 December 2024 to all wikis where Reference Check is currently offered as defaut. At this time of deployment, this included all Wikipedia except English and German Wikipedia. See Edit check/Deployment Status.

The purpose of this analysis is to review any changes in constructive activation, as defined by the WE 1.2 KR, and in the identifed Edit Check guardrail metrics by reviewing data two weeks before and after the change.

Research Questions

In the two weeks before and after this change was merged,

  • Do we notice a change in constructive activation rates?
  • Do we notice a change in the frequency and number of edit checks presented?
    • The number of editing sessions where edit check was shown
  • Do we notice a change in how likely the edit check change is disrupting to the person’s editing experience as measured by:
    • Edit completion rate
    • False positive report rate
    • Revert rate
  • Do we notice a change in how likely edit check is to cause people to publish constructive edits as measured by:
    • Absolute number of people that made a change to address the policy violation edit check was alerting them of.
    • Proportion of all published new content edits where the edit check was shown that made a change to address the policy violation edit check was alerting them of.

Summary of Findings

Overall, we did not observe any sharp declines or increases in constructive activation or the identified guardrail metrics following the deployment of the change.

We did observe some slight positive changes indicating that the new location of reference check to the side rail may be increasing user engagement with the check. These positive changes include an increase in the proportion of edits that were shown reference check and added a reference as well as a decrease in revert rate. However, as this is not a controlled experiment, other potential external factors may contribute to some of these observed changes.

Constructive Activation Rate

  • During this reviewed timeframe, reference check was shown to 14% of all VisualEditor edits and 6.7% of all newcomers that created an account.
  • There were no significant changes in constructive activation rates. Constructive activation rate was 28.7% prior to the Multi-Check Phase 1 deployment and 28.9% after the deployment. This represent only a 0.7% increase which is not statistically signficant.

Edit Completion Rate

  • Overall, there were no changes in edit completion rate with the inclusion of edits that were reverted. 80% of edits that were shown reference check were successfully saved pre and post the move of edit checks to the side rail.
  • If we exclude reverted edits, there was a slight increase (3.2% increase [2 percentage points]) in edits completed. 68% of edits where reference check was presented was successfully saved and not reverted following the move of the check to the side rail compared to 66% of edits successfully completed prior to the change.
  • We also observed similar increases in edit completion rate (excluding reverted edits) across all experience level groups (newcomers, junior contributors, and unregistered users) and wikis.

False Positive Rate

  • Overall declines of reference check decreased by 4.7%. 51.2% of edit attempts included an explicit reason for declining an reference check following the change while 53.7% of edit attempts included a decline reason prior to the change.
  • There was a 6.4% (0.5 percentage points, 7.8% pre to 8.3% post) increase in the proportion of edit attempts that indicated that the reference check presented was irrelevant; however, the rate of all other types of declines decreased.
  • There were no significant changes in decline rates by editor experience level.

Revert Rate

  • We observed the most significant change in revert rates pre and post the move of reference check to the side rail.
  • The revert rate of new content edits where reference check was presented decreased by 15.7% (20.4% pre to 17.2% post change).
  • We observed revert rate decreases across all experience level groups (unregistered, newcomer, and junior contributors) and across the majority of Wikipedias, except for Spanish Wikipedia which had a 2 percentage point increase in revert rate.

Total distinct users that included a reference after being shown reference check

  • Overall there was slight decrease in the absolute number of users that added a reference after being shown reference check (- 166 users across all wikis); however, there were no significant changes around the date edit check was moved to the side rail.
  • Additionally, the lower number of users is also likely to be impacted be seasonal trends and changes in editing activity around the December holidays.

Proportion of edits that included a reference after being shown reference check

  • There was a 8% increase in the proportion of new content edits that included a reference following the change (34.8% pre change to 37.9% post change).
  • Increases were observed across all editor experience levels and most wikis.

Constructive Activation Rate

For WE 1.2 KR, we defined constructive activation as: “The percentage of newcomers making at least one edit to an article in the main namespace of a Wikipedia project on a mobile device within 24 hours of registration (also on a mobile device) and that edit not being reverted within 48 hours of being published.”

For this analysis, we are reviewing constructive activation on desktop devices instead of mobile as this is where the Multi-Check Phase 1 change was deployed. However, reviewing the impacts from edit check changes on this metric will help use develop a better understanding of how we are tracking against the improvment targets and decide if any adjustment to the strategy are needed.

Methodology

We gathered desktop registrations from two weeks pre and post deployment of the change deployed on December 12th. For those registrations, we gathered data on edits to a main namespace completed on a desktop device within 24 hours of registration and the reverts of those edits. English and German Wikipedia were excluded as reference check was not available as default on those wikis at the time of this analysis. See Edit check/Deployment Status.

We reviewed changes to contructive activation rates overall as well as for edits where edit check was shown.

Show the code
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
    library(lubridate)
    library(ggplot2)
    library(dplyr)
    library(gt)
})
#set preferences
options(dplyr.summarise.inform = FALSE)
options(repr.plot.width = 15, repr.plot.height = 10)

Data Gathering and Cleaning

Show the code
# load data for assessing activations
all_users_edit_data <-
  read.csv(
    file = 'data/activation-edit-data.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Show the code

#reformat user-id and adjust to include wiki to account for duplicate user id instances.
# Users do not have the smae user_id on different wikis
all_users_edit_data$user_id <-
  as.character(paste(all_users_edit_data$user_id,all_users_edit_data$wiki_db,sep ="-" ))
Show the code
# Check for duplicate user ids
length(unique(all_users_edit_data$user_id)) == nrow(all_users_edit_data)
TRUE
Show the code
# format registration timestamp to day
all_users_edit_data$user_registration_timestamp <- 
as.Date(all_users_edit_data$user_registration_timestamp, format = "%Y-%m-%d")
Show the code
#add column to calculate pre and post dates based on registration timestamp

all_users_edit_data <- all_users_edit_data %>%
  mutate(
    pre_post = case_when(
        user_registration_timestamp  >= '2024-11-27' & user_registration_timestamp  <= '2024-12-11' ~ "pre",
        user_registration_timestamp  >= '2024-12-12' & user_registration_timestamp  <= '2024-12-26' ~ "post"
        ),
     pre_post  = factor( pre_post ,
         levels = c("pre", "post")
   )) 

Number of edits completed by newcomers

We want to first take a quick look at the types of edits newcomers complete 24 hours after registering to understand how many of these users are encountering reference check.

Show the code
all_users_edit_data %>%
filter(num_article_edits_24hrs_all > 0) %>%
summarise(n_users = n_distinct(user_id),
         n_api = n_distinct(user_id[reg_w_api == 1]),
         n_mobile = n_distinct(user_id[reg_on_mobile == 1]))
Show the code
# filter out users that registered on mobile web or mobile app

all_users_edit_desktop <- all_users_edit_data %>%
filter(reg_on_mobile == 0 & reg_w_api == 0)
Show the code
# Total Edits
all_users_edit_desktop %>%
# group_by(pre_post)  %>%
summarise(num_article_edits_24hrs_all = sum(num_article_edits_24hrs_all),
          num_article_edits_24hrs_visualeditor = sum(num_article_edits_24hrs_visualeditor),
          num_article_edits_24hrs_editcheck = sum(num_article_edits_24hrs_editcheck))
Show the code
# Proportion of newcomers shown edit check
edit_check_users <- all_users_edit_desktop %>%
    mutate(showneditcheck = ifelse(num_article_edits_24hrs_editcheck > 0, "ec", "no_ec")) %>%
    group_by(showneditcheck) %>%
    summarise(num_users = n()) %>%
    mutate(pct_users = paste0(round(num_users/sum(num_users) *100, 1), "%"))  

edit_check_users

Some insights:

  • 66% of all edits completed by newcomers 24 hours after registering are completed on VisualEditor. About 14% of these edits were shown reference check.
  • 6.7% of all newcomers were shown edit check at least once.

Note: No changes were made in the Multi-Check Phase 1 deployment which would impact how frequently the check was shown. We did observe decreases in the absolute number of all types of edits following the change. Note: This is likely related to seasonal changes due to the end of December holidays.

Constructive Activation Rates

Show the code
## add column to define overall activation
all_users_edit_desktop <- all_users_edit_desktop %>%
    mutate(is_activated = ifelse(
        num_article_edits_24hrs_all > 0, 'is_activated', 'is_not_activated'))
Show the code
## add column to define overall constructive activation
all_users_edit_desktop  <- all_users_edit_desktop  %>%
    mutate(is_constr_activated = ifelse(
        (num_article_edits_24hrs_all - num_article_reverts_24hrs_all) > 0,
        'is_constr_activated', 'not_constr_activated'))
Show the code
## add column to define constructive activation by VE
all_users_edit_desktop  <- all_users_edit_desktop  %>%
    mutate(is_constr_activated_visualeditor = 
           case_when(
        (num_article_edits_24hrs_visualeditor - num_article_reverts_24hrs_visualeditor) > 0 ~ "constr_activation_visualeditor",
        TRUE ~ 'not_constr_activated'))
Show the code
## add column to define constructive activation by EditCheck
## Defining as a user shown edit check at least once and defined as making constructive edits
all_users_edit_desktop  <- all_users_edit_desktop  %>%
    mutate(is_constr_activated_editcheck = 
           case_when(num_article_edits_24hrs_editcheck > 0 &
        (num_article_edits_24hrs_all - num_article_reverts_24hrs_all) > 0 ~ "constr_activation_editcheck",
        TRUE ~ 'not_constr_activated'))
Show the code
# check that activation was defined appropriately
all_users_edit_desktop %>%
    filter(is_activated == 'is_activated',
         num_article_edits_24hrs_all > 0 ) %>%
    slice_head(n= 5)

Overall

Show the code
# Desktop
constructive_activation_desktop <- all_users_edit_desktop %>%
    #filter(num_article_edits_24hrs_visualeditor == 0) %>%
    group_by(pre_post, is_constr_activated) %>%
    summarise(num_users = n()) %>%
    mutate(pct_users = paste0(round(num_users/sum(num_users) *100, 1), "%"))  %>%
    filter(is_constr_activated == 'is_constr_activated') %>% 
    select(-2) %>% 
    ungroup()  %>%
    gt()  %>%
    opt_stylize(5) %>%
    tab_header(
    title = "Contructive Activation Rates Overall"
      )  %>%
  cols_label(
    pre_post = "Pre or post change",
    num_users = "Number of newcomers",
    pct_users = "Constructive Activation Rates"
  ) 


display_html(as_raw_html(constructive_activation_desktop))
Contructive Activation Rates Overall
Pre or post change Number of newcomers Constructive Activation Rates
pre 6202 28.7%
post 4959 28.9%

Constructive Activation Rates for VisualEditor

Show the code
## VE
constructive_activation_visualeditor <- all_users_edit_desktop %>%
    group_by(pre_post, is_constr_activated_visualeditor) %>%
    summarise(num_users = n()) %>%
    mutate(pct_users = paste0(round(num_users/sum(num_users) *100, 1), "%")) %>%
    filter(is_constr_activated_visualeditor == 'constr_activation_visualeditor')%>% 
    select(-2) %>% 
    ungroup()  %>%
    gt()  %>%
    opt_stylize(5) %>%
    tab_header(
    title = "Contructive Activation Rates for Newcomers that used VisualEditor"
      )  %>%
  cols_label(
    pre_post = "Pre or post change",
    num_users = "Number of newcomers",
    pct_users = "Constructive Activation Rates"
  ) 


display_html(as_raw_html(constructive_activation_visualeditor))
Contructive Activation Rates for Newcomers that used VisualEditor
Pre or post change Number of newcomers Constructive Activation Rates
pre 4648 21.5%
post 3706 21.6%

Constructive Activation Rates for Newcomers presented at least one reference check

Definition: The percentage of that were presented at least one Reference Check andall newcomers made at least one edit to an article in the main namespace of a Wikipedia project on a desktop device within 24 hours of registration and that edit not being reverted within 48 hours of being published.”

Show the code
## EditCheck
constructive_activation_ec <- all_users_edit_desktop %>%
    group_by(pre_post, is_constr_activated_editcheck) %>%
    summarise(num_users = n()) %>%
    mutate(pct_users = round(num_users/sum(num_users) *100, 2))  %>%
    filter(is_constr_activated_editcheck== 'constr_activation_editcheck')%>% 
    select(-2) %>% 
    ungroup()  %>%
    gt()  %>%
    opt_stylize(5) %>%
    tab_header(
    title = "Contructive Activation Rates for Newcomers shown Reference Check"
      )  %>%
  cols_label(
    pre_post = "Pre or post change",
    num_users = "Number of newcomers",
    pct_users = "Constructive Activation Rates"
  ) 


display_html(as_raw_html(constructive_activation_ec))
Contructive Activation Rates for Newcomers shown Reference Check
Pre or post change Number of newcomers Constructive Activation Rates
pre 1205 5.57
post 1017 5.92

The low proportion here primarily reflects that only a small proportion of all users that created account reach the stage where a reference check would be presented (when they attempt to save).

A newcomer would need to succesfully transtion through stages after creating an account before reaching this stage. The work that will be completed in T385906 will help visualize the full constructive activation funnel and help better isolate the impact of edit check on this metric.

Constructive Activation Rates By Wiki

Show the code
# all dekstop by wiki
constructive_activation_wiki <- all_users_edit_desktop %>%
    group_by(wiki_db, pre_post, is_constr_activated) %>%
    summarise(num_users = n_distinct(user_id)) %>%
    mutate(pct_users = paste0(round(num_users/sum(num_users) *100, 2), "%"))  %>%
   filter(is_constr_activated == 'is_constr_activated',
         num_users > 250)%>%   ## Limit to wikis with over 150 users that made an edit
    group_by(wiki_db) %>%
    #select(-2) %>% 
    gt()  %>%
    opt_stylize(5) %>%
    tab_header(
    title = "Contructive Activation Rates by Wiki"
      )  %>%
  cols_label(
    wiki_db = "Wikipedia",
    pre_post = "Pre or post change",
    num_users = "Number of newcomers",
    pct_users = "Constructive Activation Rates"
  ) %>%
  tab_footnote(
    footnote = "Limited to wikis with at least 500 newcomers that created accounts during reviewed timeframe",
    locations = cells_column_labels(
      columns = "num_users"
    )
  ) 


display_html(as_raw_html(constructive_activation_wiki))
Show the code
## VE
constructive_activation_visualeditor_wiki <- all_users_edit_desktop %>%
    group_by(wiki_db, pre_post, is_constr_activated_visualeditor) %>%
    summarise(num_users = n()) %>%
    mutate(pct_users = paste0(round(num_users/sum(num_users) *100, 1), "%"))  %>%
    filter(is_constr_activated_visualeditor == 'constr_activation_visualeditor',
           num_users >250)  %>% # limit to wikis with 100 users
    group_by(wiki_db) %>%
    #select(-2) %>% 
    gt()  %>%
    opt_stylize(5) %>%
    tab_header(
    title = "Contructive Activation Rates by Wiki for Newcomers that used VisualEditor"
      )  %>%
  cols_label(
    wiki_db = "Wikipedia",
    pre_post = "Pre or post change",
    num_users = "Number of newcomers",
    pct_users = "Constructive Activation Rates"
  ) %>%
  tab_footnote(
    footnote = "Limited to wikis with at least 500 newcomers that created accounts during reviewed timeframe",
    locations = cells_column_labels(
      columns = "num_users"
    )
  ) 


display_html(as_raw_html(constructive_activation_visualeditor_wiki ))
Contructive Activation Rates by Wiki for Newcomers that used VisualEditor
Pre or post change is_constr_activated_visualeditor Number of newcomers1 Constructive Activation Rates
eswiki
pre constr_activation_visualeditor 611 19.8%
post constr_activation_visualeditor 365 19.5%
frwiki
pre constr_activation_visualeditor 696 24.9%
post constr_activation_visualeditor 537 24%
jawiki
pre constr_activation_visualeditor 315 28%
post constr_activation_visualeditor 294 27.2%
ptwiki
pre constr_activation_visualeditor 380 22.2%
post constr_activation_visualeditor 306 25.3%
ruwiki
pre constr_activation_visualeditor 317 18.1%
post constr_activation_visualeditor 303 20.5%
1 Limited to wikis with at least 500 newcomers that created accounts during reviewed timeframe

Constructive Edits

Reference check is not presented to newcomers until they attempt to save an edit, requiring them to successfully transition through several stages after creating an account prior to reaching this stage. To help isolate the impact of this intervention on newcomers, we also reviewed changes in overall constructive edit rates. This limits the analysis to newcomers that successfully published an edit.

For this analysis, we’re defining constructive edits as the proportion of all edits completed by newcomers within 24 hours that are not reverted within 48 hours. This is limited to users that were shown at least once reference check within 24 hours after registering.

Show the code
# constructive edits
constructive_edits_editcheck <- all_users_edit_desktop %>%
    filter(num_article_edits_24hrs_editcheck > 0) %>%  #limit to edits where ref check was shown at least once
    group_by(pre_post) %>%
    summarise(num_article_edits_total = sum(num_article_edits_24hrs_all),
              num_article_reverts_total = sum(num_article_reverts_24hrs_all)) %>%
    mutate(pct_const = paste0(round((num_article_edits_total-num_article_reverts_total)/num_article_edits_total * 100, 1), "%")) %>%
    gt()  %>%
    opt_stylize(5) %>%
    tab_header(
    title = "Proportion of constructive edits completed by newcomers shown Reference Check at least once"
      )  %>%
  cols_label(
    pre_post = "Pre or post change",
    num_article_edits_total = "Total number of edits published",
    num_article_reverts_total = "Total number of edits reverted",
    pct_const = "Constructive Edit Rate"
  ) %>%
  tab_footnote(
    footnote = "Defined as the proportion of all published edits that are reverted within 48 hours",
    locations = cells_column_labels(
      columns = "pct_const"
    )
  ) 

display_html(as_raw_html(constructive_edits_editcheck))
Proportion of constructive edits completed by newcomers shown Reference Check at least once
Pre or post change Total number of edits published Total number of edits reverted Constructive Edit Rate1
pre 5151 924 82.1%
post 4431 662 85.1%
1 Defined as the proportion of all published edits that are reverted within 48 hours

Key Findings:

  • Reference check is not presented to newcomers until they attempt to save an edit, requiring them to successfully transition through several stages after creating an account before reaching this stage. During this reviewed timeframe, reference check was shown to 14% of all VisualEditor edits and 6.7% of all newcomers that created an account.
  • There were no significant changes in constructive activation rates when reviewing overall edits or when limits to edits completed with VisualEditor. Constructive activation rate was 28.7% prior to the Multi-Check Phase 1 deployment and 28.9% after the deployment. This represent only a 0.7% increase which is not statistically signficant.
  • If we limit to only users that published edit, there was a +3.7% increase in total constructive edits following the Multi-Check Phase 1 deployments for newcomers that were presented with at least one reference check.

Note: The work that will be completed in T385906 will help visualize the full constructive activation funnel and help better isolate the impact of edit check on constructive activation rates.

Edit Check Guardrails

Methodology

We reviewed a sample of edits collected two weeks pre and post deployment of the change deployed on Dec 12th to present a single reference check in a side rail to people within visual editor on desktop.

Data was limited to edits completed by unregistered users or users with 100 or fewer edits on a desktop main page namespace on all the wikis where Reference Check is deployed as default. See deployment status.

Data was collected from EditAttemptStep, VisualEditorFeatureUse and mediawiki_history. Note: For EditAttemptStep and VisualEditorFeatureUse, the logging of edit check events changed after the edit check was moved to the side rail. See instrumentation changes below:

  • Pre Change: event.feature = ‘editCheckReferences’ and event.action = ‘context-show’
  • Post Change: event.feature = ‘editCheckDialog’ OR event.action ‘window-open-from-check’

Number of editing sessions where reference check was shown

As this change did not increase the number of checks presented in a single session, we did not review the average number of checks presented in a single session. However, we did review the number of editing sessions where reference checks were presented to the users to confirm this change in edit check location did not change how frequently the edit check was activated.

Note: This query to collect this data is resource intensive as it gathers all edit attempts, so we limited to a sample of edit attempts completed from Dec 4 through Dec 19th (1 week pre and post the change) to a subset of wikis (gurwiki, fonwiki, gpewiki, hawiki, kgwiki, lnwiki, arwiki, afwiki, zhwiki, frwiki, itwiki, jawiki, ptwiki, eswiki, swwiki, viwiki, yowiki)

Show the code
#load frequenct data
edit_check_frequency_data <-
  read.csv(
    file = 'data/edit_check_frequency_data.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Show the code
# data reformatting

edit_check_frequency_data$date <- as.Date(edit_check_frequency_data$date, format = "%Y-%m-%d")

# Set experience level group and factor levels
edit_check_frequency_data <- edit_check_frequency_data %>%
  mutate(
    experience_level_group = case_when(
     user_edit_count == 0 & user_status == 'registered' ~ 'Newcomer',
     user_edit_count == 0 & user_status == 'unregistered' ~ 'Unregistered',
      user_edit_count > 0 &  user_edit_count <= 100 ~ "Junior Contributor",
      user_edit_count >  100 ~ "Non-Junior Contributor"   
    ),
    experience_level_group = factor(experience_level_group,
         levels = c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor")
   ))  

#add column to calulcate pre and post dates
edit_check_frequency_data <- edit_check_frequency_data %>%
  mutate(
    pre_post = case_when(
        date >= '2024-12-04' & date <= '2024-12-11' ~ "pre",
        date >= '2024-12-12' & date <= '2024-12-19' ~ "post"
        ),
     pre_post  = factor( pre_post ,
         levels = c("pre", "post")
   )) 
Show the code
edit_session_num_overall <- edit_check_frequency_data %>%
    filter(was_edit_check_shown == 1) %>%
    group_by(pre_post) %>%
    summarise(n_editing_session = n_distinct(editing_session),
              n_users = n_distinct(user_id)) %>%
    gt()  %>%
    tab_header(
    title = "Number of editing sessions where reference check was shown pre and post edit check change"
      )  %>%
  cols_label(
    pre_post = "Pre or Post Change",
    n_editing_session = "Number of Editing Sessions",
    n_users = "Number of Users"
  ) 


display_html(as_raw_html(edit_session_num_overall))
Number of editing sessions where reference check was shown pre and post edit check change
Pre or Post Change Number of Editing Sessions Number of Users
pre 3343 1236
post 3349 1177
Show the code
edit_session_num_daily <- edit_check_frequency_data %>%
    filter(was_edit_check_shown == 1) %>%
    group_by(date) %>%
    summarise(n_editing_session = n_distinct(editing_session),
              n_users = n_distinct(user_id)) 
Show the code
# plot daily editing sessions
textaes <- data.frame(y = 475,
                      x = as.Date(c('2024-12-15')),
                      lab = c("Reference check moved to side rail"))


p <- edit_session_num_daily  %>%
        ggplot(aes(x = date, y = n_editing_session)) +
        geom_line(linewidth = 1.5, color = 'steelblue2') +
        geom_vline(xintercept = as.Date('2024-12-12'), linetype = 'dashed', size = 1) +
        geom_segment(aes(x = as.Date(c('2024-12-15')), y = 450, xend = as.Date('2024-12-12'), yend = 420),
                  arrow = arrow(length = unit(0.8, "cm")), size = 1, color = "black") +
        geom_text(mapping = aes(y = y, x = x, label = lab), 
            data = textaes, inherit.aes = FALSE, size = 5) +
        scale_x_date(date_labels = "%b-%d", date_breaks = "1 week", minor_breaks = NULL) +
        scale_y_continuous(limits =c(0, 500))+
        labs(title = "Daily number of editing sessions where reference check was shown",
           y = "Number of distinct editing sessions") +
        theme_bw() +
        scale_color_manual(values= c("#000099", "#666666"), name = "Final state")  +
        theme(
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),
            panel.background = element_blank(),
            plot.title = element_text(hjust = 0.5),
            text = element_text(size=18),
            legend.position="bottom",
            axis.text.x = element_text(hjust=1),
            axis.line = element_line(colour = "black"))
p
Warning message in geom_segment(aes(x = as.Date(c("2024-12-15")), y = 450, xend = as.Date("2024-12-12"), :
“All aesthetics have length 1, but the data has 16 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.”

There were no significant changes in the number of editing sessions where reference check was shown pre and post deployment of the change.

Edit Completion Rate

We reviewed the proportion of edits by newcomers, junior contributors, and unregistered users that were shown reference check during their edit session and successfully published their edit (event.action = saveSuccess). The analysis does not include all edits started but is limited to only edits that met the save attempt step where reference check is shown and then subsequently published their edit.

Show the code
# load data for assessing edit completion rate
edit_completion_rates_data <-
  read.csv(
    file = 'data/edit_completion_rate_data.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Show the code
# data reformatting

edit_completion_rates_data$date <- as.Date(edit_completion_rates_data$date, format = "%Y-%m-%d")

# Set experience level group and factor levels
edit_completion_rates_data <- edit_completion_rates_data %>%
  mutate(
    experience_level_group = case_when(
     user_edit_count == 0 & user_status == 'registered' ~ 'Newcomer',
     user_edit_count == 0 & user_status == 'unregistered' ~ 'Unregistered',
      user_edit_count > 0 &  user_edit_count <= 100 ~ "Junior Contributor",
      user_edit_count >  100 ~ "Non-Junior Contributor"   
    ),
    experience_level_group = factor(experience_level_group,
         levels = c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor")
   ))  

#add column to calulcate pre and post dates
# Need to update these
edit_completion_rates_data <- edit_completion_rates_data %>%
  mutate(
    pre_post = case_when(
        date >= '2024-11-27' & date <= '2024-12-11' ~ "pre",
        date >= '2024-12-12' & date <= '2024-12-26' ~ "post"
        ),
     pre_post  = factor( pre_post ,
         levels = c("pre", "post")
   )) 

Overall Edit Completion Rates

Includes Reverted Edits

Show the code
edit_completion_rate_overall <- edit_completion_rates_data %>%
    group_by(pre_post) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0])) %>%
    mutate(completion_rate = paste0(round(n_saves/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "Edit Completion Rate Pre and Post Change (Inlcudes Reverted Edits)"
      )  %>%
  cols_label(
    pre_post = "Pre or Post Change",
    n_edits = "Number of editing sessions shown reference check",
    n_saves = "Number of published edits",
    completion_rate = "Proportion of reference check sessions saved"
  ) %>%
    opt_stylize(style =2) 


display_html(as_raw_html(edit_completion_rate_overall ))
Edit Completion Rate Pre and Post Change (Inlcudes Reverted Edits)
Pre or Post Change Number of editing sessions shown reference check Number of published edits Proportion of reference check sessions saved
pre 15206 12158 80%
post 13266 10674 80.5%

Exludes Reverted Edits

Show the code
edit_completion_rate_overall <- edit_completion_rates_data %>%
    group_by(pre_post) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0 & was_reverted == 0])) %>%
    mutate(completion_rate = paste0(round(n_saves/n_edits * 100, 1), "%")) %>%   
    gt()  %>%
    tab_header(
    title = "Edit Completion Rate Pre and Post Change (Exludes Reverted Edits)"
      )  %>%
  cols_label(
    pre_post = "Pre or Post Change",
    n_edits = "Number of editing sessions shown reference check",
    n_saves = "Number of published edits",
    completion_rate = "Proportion of reference check sessions saved"
  ) %>%
    opt_stylize(style =2) 


display_html(as_raw_html(edit_completion_rate_overall ))
Edit Completion Rate Pre and Post Change (Exludes Reverted Edits)
Pre or Post Change Number of editing sessions shown reference check Number of published edits Proportion of reference check sessions saved
pre 15206 10031 66%
post 13266 9036 68.1%
Show the code
edit_completion_rate_overall_daily <- edit_completion_rates_data %>%
    group_by(date) %>%
    summarise(n_save_intent = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0 & was_reverted == 0])) %>%
    mutate(completion_rate = n_saves/n_save_intent)
Show the code
# plot daily editing sessions
textaes <- data.frame(y = c(0.76),
                      x = as.Date(c('2024-12-17')),
                      lab = c("Reference check presented in side rail"))


p <- edit_completion_rate_overall_daily %>%
        ggplot(aes(x = date, y = completion_rate)) +
        geom_line(size = 1.5, color = '#0072B2') +
        geom_vline(xintercept = as.Date('2024-12-12'), linetype = 'dashed', size = 1) +
        geom_segment(aes(x = as.Date(c('2024-12-15')), y = 0.75, xend = as.Date('2024-12-12'), yend = 0.66),
                  arrow = arrow(length = unit(0.7, "cm")), size = 1, color = "black") +
        geom_text(mapping = aes(y = y, x = x, label = lab), 
            data = textaes, inherit.aes = FALSE, size = 5) +
        scale_y_continuous(labels = scales::percent, limits = c(0.4, 0.8)) +
        scale_x_date(date_labels = "%b-%d", date_breaks = "1 week", minor_breaks = NULL) +
        labs(title = "Daily edit completion rate for sessions shown Reference Check",
           y = "Proportion of editing sessions",
            caption = "excludes edits reverted within 48 hours") +
        theme_bw() +
        theme(
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),
            panel.background = element_blank(),
            plot.title = element_text(hjust = 0.5),
            text = element_text(size=18),
            legend.position="bottom",
            axis.text.x = element_text(hjust=1),
            axis.line = element_line(colour = "black"))
p
Warning message:
“Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.”
Warning message in geom_segment(aes(x = as.Date(c("2024-12-15")), y = 0.75, xend = as.Date("2024-12-12"), :
“All aesthetics have length 1, but the data has 30 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.”

By User Edit Count

Show the code
edit_completion_rate_editcount <- edit_completion_rates_data %>%
    filter(experience_level_group != 'Non-Junior Contributor') %>% 
    group_by(experience_level_group, pre_post) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0 & was_reverted == 0])) %>%
    mutate(completion_rate = paste0(round(n_saves/n_edits * 100, 1), '%')) %>%   
    gt()  %>%
    tab_header(
    title = "Edit Completion Rate By Editor Experience"
      )  %>%
  cols_label(
    experience_level_group = "Experience Level",
    pre_post = "Pre or Post Change",
    n_edits = "Number of editing sessions shown reference check",
    n_saves = "Number of published edits",
    completion_rate = "Proportion of reference check sessions saved"
  )  %>%
    opt_stylize(style =2) %>%
  tab_footnote(
    footnote = "Excludes edits reverted within 48 hours",
    locations = cells_column_labels(
      columns = "n_saves"
    )
  ) 
 

display_html(as_raw_html(edit_completion_rate_editcount))
Edit Completion Rate By Editor Experience
Pre or Post Change Number of editing sessions shown reference check Number of published edits1 Proportion of reference check sessions saved
Unregistered
pre 8568 4905 57.2%
post 7243 4299 59.4%
Newcomer
pre 1742 1152 66.1%
post 1469 995 67.7%
Junior Contributor
pre 4896 3974 81.2%
post 4554 3742 82.2%
1 Excludes edits reverted within 48 hours

By Wiki

Show the code
edit_completion_rate_wiki <- edit_completion_rates_data %>%
    group_by(wiki, pre_post) %>%
    summarise(n_edits = n_distinct(editing_session),
              n_saves = n_distinct(editing_session[saved_edit > 0])) %>%
    mutate(completion_rate = paste0(round(n_saves/n_edits * 100, 1), '%')) %>%
    filter(n_saves > 500)%>%  ##limited to wikis with over w00 saved edits    
    gt()  %>%
    tab_header(
    title = "Edit Completion Rate By Wiki"
      )  %>%
  cols_label(
    wiki = "Wiki",
    pre_post = "Pre or Post Change",
    n_edits = "Number of editing sessions shown reference check",
    n_saves = "Number of published edits",
    completion_rate = "Proportion of reference check sessions saved"
  ) %>%
  tab_footnote(
    footnote = "Limited to wikis with over 100 saved edits where edit check was shown",
    locations = cells_column_labels(
      columns = 'n_edits'
    )
  ) %>%
    opt_stylize(style =2)  %>%
  tab_footnote(
    footnote = "Excludes edits reverted within 48 hours",
    locations = cells_column_labels(
      columns = "n_saves"
    )
  ) 

display_html(as_raw_html(edit_completion_rate_wiki))
Edit Completion Rate By Wiki
Pre or Post Change Number of editing sessions shown reference check1 Number of published edits2 Proportion of reference check sessions saved
eswiki
pre 1123 930 82.8%
post 779 660 84.7%
frwiki
pre 1921 1700 88.5%
post 1748 1531 87.6%
itwiki
pre 1270 1020 80.3%
post 1095 937 85.6%
nlwiki
pre 966 621 64.3%
post 690 506 73.3%
ruwiki
pre 1546 1236 79.9%
post 1447 1168 80.7%
1 Limited to wikis with over 100 saved edits where edit check was shown
2 Excludes edits reverted within 48 hours

Key Findings:

  • Overall, there were no changes in edit completion rate with the inclusion of edits that were reverted. 80% of edits where reference check was presented were successfully saved pre and post the move of edit checks to the side rail.
  • If we exclude reverted edits, there was a slight increase (3.2% increase [2 percentage points]) in edits completed. 68% of edits where reference check was presented was successfully saved and not reverted following the move of the check to the side rail compared to 66% of edits successfully completed prior to the change.
  • We also observed similar increases in edit completion rate (excluding reverted edits) across all experience level groups (newcomers, junior contributors, and unregistered users) and wikis.

False Positive Rate

The proportion of edits where reference check was shown and the contributor dismissed adding a citation by explicitly indicating that the information they are adding does not need to make the specified change.

For this metric, we reviewed the proportion of edits where reference check was shown and the contributor dismissed adding a citation by explicitly indicating that the information they are adding does not violate the specified policy. This was determined in the data by reviewing published edits with decline-irrelevant tag. We also reviewed the proportion of edits that explicitly added a reason for declining the reference check to see in changes in the overall decline rate.

Note: this metric relies on users explicitly selecting an option. It does not account for instances where the reference check was shown in error and the user did not select one of the provided options for declining the check.

See available edit tags for documentation of decline options.

Data Gathering and Processing

Show the code
# load published edit tag data
published_edit_check_data  <-
  read.csv(
    file = 'data/published_edit_check_data.tsv',
    header = TRUE,
    sep = "\t",
    stringsAsFactors = FALSE
  ) 
Show the code
# data reformatting

published_edit_check_data$date <- as.Date(published_edit_check_data$date, format = "%Y-%m-%d")

# Set experience level group and factor levels
published_edit_check_data <- published_edit_check_data %>%
  mutate(
    experience_level_group = case_when(
    is.na(user_edit_count) ~ 'Unregistered',
     user_edit_count == 1 ~ 'Newcomer',
      user_edit_count > 1 &  user_edit_count <= 100 ~ "Junior Contributor",
      user_edit_count >  100 ~ "Non-Junior Contributor" 
    ),
    experience_level_group = factor(experience_level_group,
         levels = c("Unregistered", "Newcomer", "Non-Junior Contributor", "Junior Contributor")
   ))  


# rename is mobile edit
published_edit_check_data <- published_edit_check_data %>%
  mutate(
    platform = case_when(
     is_mobile_edit == 1  ~ 'phone',
    is_mobile_edit == 0 ~ 'desktop',
    ))  


#add column to calulcate pre and post dates
published_edit_check_data<- published_edit_check_data %>%
  mutate(
    pre_post = case_when(
          date >= '2024-11-27' & date <= '2024-12-11' ~ "pre",
        date >= '2024-12-12' & date <= '2024-12-26' ~ "post"
        ),
     pre_post  = factor( pre_post ,
         levels = c("pre", "post")
   )) 
 

Overall Declines

Show the code
edit_check_decline_overall <- published_edit_check_data %>%
    filter(is_edit_check_activated == 1) %>% #only edits where shown
    group_by(pre_post) %>%
    summarise(n_edits = n_distinct(revision_id),
             n_edits_decline = n_distinct(revision_id[decline_other == 1| decline_common_knowledge == 1| decline_irrelevant == 1| decline_uncertain == 1]),
             )  %>%  
     mutate(prop_users = paste0(round(n_edits_decline/n_edits * 100, 1), "%")) %>%     
    gt()  %>%
    tab_header(
    title = "Overall Edit Check Declines"
      )  %>%
  cols_label(
    pre_post = "Pre or Post Change",
    n_edits = "Number of published edits where reference check was shown",
    n_edits_decline = "Number of published edits that selected decline option",
    prop_users = "Proportion of edits that declined reference check"
  ) %>%
    opt_stylize(style =2) 


display_html(as_raw_html(edit_check_decline_overall))
Overall Edit Check Declines
Pre or Post Change Number of published edits where reference check was shown Number of published edits that selected decline option Proportion of edits that declined reference check
pre 16230 8719 53.7%
post 13624 6977 51.2%
Show the code
edit_check_decline_overall_daily <- published_edit_check_data  %>%
    filter(is_edit_check_activated == 1) %>% #only edits where shown
    group_by(date, pre_post) %>%
    summarise(n_edits = n_distinct(revision_id),
             n_edits_decline = n_distinct(revision_id[decline_other == 1| decline_common_knowledge == 1| decline_irrelevant == 1| decline_uncertain == 1]),
             )  %>%  
     mutate(prop_edits = round(n_edits_decline/n_edits, 3)) 
Show the code
# plot daily editing sessions
textaes <- data.frame(y = c(0.66),
                      x = as.Date(c('2024-12-16')),
                      lab = c("Reference check moved to side rail"))


p <- edit_check_decline_overall_daily %>%
        ggplot(aes(x = date, y = prop_edits)) +
        geom_line(size = 1.5, color = '#0072B2') +
        geom_vline(xintercept = as.Date('2024-12-12'), linetype = 'dashed', size = 1) +
        geom_segment(aes(x = as.Date(c('2024-12-15')), y = 0.65, xend = as.Date('2024-12-12'), yend = 0.55),
                  arrow = arrow(length = unit(0.8, "cm")), size = 1, color = "black") +
        geom_text(mapping = aes(y = y, x = x, label = lab), 
            data = textaes, inherit.aes = FALSE, size = 5) +
        scale_y_continuous(labels = scales::percent, limits = c(0.3, 0.7)) +
        scale_x_date(date_labels = "%b-%d", date_breaks = "1 week", minor_breaks = NULL) +
        labs(title = "Daily overall decline rate for sessions shown Reference Check",
           y = "Proportion of reference checks declined",
            caption = "includes all edits where a user added an explicit reason for declining reference check") +
        theme_bw() +
        theme(
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),
            panel.background = element_blank(),
            plot.title = element_text(hjust = 0.5),
            text = element_text(size=18),
            legend.position="bottom",
            axis.text.x = element_text(hjust=1),
            axis.line = element_line(colour = "black"))

p
Warning message in geom_segment(aes(x = as.Date(c("2024-12-15")), y = 0.65, xend = as.Date("2024-12-12"), :
“All aesthetics have length 1, but the data has 29 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.”

Show the code

# overall by type
edit_check_decline_overall_bytype <- published_edit_check_data %>%
    filter(is_edit_check_activated == 1) %>% #only edits where shown
    group_by(pre_post)  %>%
    summarise(n_edits = n_distinct(revision_id),
            decline_uncertain = n_distinct(revision_id[decline_uncertain == 1]),
             decline_other = n_distinct(revision_id[decline_other == 1]),
             decline_common_knowledge = n_distinct(revision_id[decline_common_knowledge == 1]),
             decline_irrelevant = n_distinct(revision_id[decline_irrelevant == 1]),
             )  %>% 
    pivot_longer(cols = contains('decline'), names_to = "decline_reason", values_to = "n_decline_edits") %>% 
    mutate (prop_users = paste0(round(n_decline_edits/n_edits *100, 1), "%")) %>%  
    select(-c(2,4)) %>%  
    group_by(decline_reason) %>% 
    gt()  %>%
    tab_header(
    title = "Overall Edit Check Declines By Decline Reason"
      )   %>%
  cols_label(
    pre_post = "Pre or Post Change",
    decline_reason = "Decline Reason",
    prop_users = "Proportion of edits that declined reference check"
  ) %>%
    opt_stylize(style =2) 

display_html(as_raw_html(edit_check_decline_overall_bytype ))
Overall Edit Check Declines By Decline Reason
Pre or Post Change Proportion of edits that declined reference check
decline_uncertain
pre 11.9%
post 10.3%
decline_other
pre 18.6%
post 17.8%
decline_common_knowledge
pre 15.4%
post 14.9%
decline_irrelevant
pre 7.8%
post 8.3%

By Editor Experience

Show the code
edit_check_decline_userexp  <- published_edit_check_data%>%
    filter(
          is_edit_check_activated == 1) %>% #only edits where shown) %>%
    group_by(experience_level_group, pre_post)  %>%
    summarise(n_edits = n_distinct(revision_id),
            decline_uncertain = n_distinct(revision_id[decline_uncertain == 1]),
             decline_other = n_distinct(revision_id[decline_other == 1]),
             decline_common_knowledge = n_distinct(revision_id[decline_common_knowledge == 1]),
             decline_irrelevant = n_distinct(revision_id[decline_irrelevant == 1]),
             )  %>% 
    pivot_longer(cols = contains('decline'), names_to = "decline_reason", values_to = "n_decline_edits") %>% 
    mutate(prop_edits = round(n_decline_edits/n_edits, 2))
Show the code
colorfriendly  <- c("#000000", "#E69F00", "#56B4E9", "#009E73", 
                       "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

p <- edit_check_decline_userexp  %>%
    ggplot(aes(x= decline_reason, y = prop_edits, fill = decline_reason)) +
      geom_col(position = 'dodge') +
    facet_grid(vars(pre_post), vars(experience_level_group)) +
    scale_y_continuous(labels = scales::percent) +
     geom_text(aes(label = paste0(prop_edits * 100, "%"), fontface=2), vjust=1.2, size = 8, color = "white") +
    scale_fill_manual(values= colorfriendly) +
    labs (y = "Percent of reference checks declined ",
           x = "Decline citation reason",
          title = "Proportion of edits where reference check was shown \n and declined by editor experience")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        legend.position= "bottom",
        axis.line = element_line(colour = "black"))
      
p

By Wiki

Show the code
false_positive_rate_wiki<- published_edit_check_data %>%
    filter(user_status == 'registered',
          is_edit_check_activated == 1) %>% #only edits where shown) %>%
    group_by(wiki, pre_post) %>%
    summarise(n_edits = n_distinct(revision_id),
              n_false_positive = n_distinct(revision_id[decline_irrelevant > 0])) %>%
    mutate(false_positive_rate = paste0(round(n_false_positive/n_edits * 100, 1), "%")) %>%
    filter(n_edits > 500)  %>% 
     select(-c(3,4))  %>%
    gt()  %>%
    tab_header(
    title = "Reference Check False Positive Rate by Wiki"
      )  %>%
  cols_label(
    wiki = "Wiki",
    pre_post = "Pre or Post Change",
    false_positive_rate = "False positive rate"
  ) %>%
  tab_footnote(
    footnote = "Limited to wikis with over 500 edits; 
Determined in the data by reviewing published edits with `decline-irrelevant` tag",
    locations = cells_column_labels(
      columns = 'false_positive_rate'
    )
  )%>%
    opt_stylize(style =2) 

display_html(as_raw_html(false_positive_rate_wiki))
Reference Check False Positive Rate by Wiki
Pre or Post Change False positive rate1
eswiki
pre 9.7%
post 13.1%
frwiki
pre 3%
post 2.8%
ruwiki
pre 3.9%
post 6.3%
1 Limited to wikis with over 500 edits; Determined in the data by reviewing published edits with `decline-irrelevant` tag

Key Findings:

  • Overall declines of reference check decreased by 4.7%. 51.2% of edit attempts included an explicit reason for declining an reference check following the change while 53.7% of edit attempts included a decline reason prior to the change.
  • There was a 6.4% (0.5 percentage points, 7.8% pre to 8.3% post) increase in the proportion of edit attempts that indicated that the reference check presented was irrelevant; however, the rate of all other types of declines decreased.
  • There were no significant changes in decline rates by editor experience level.
  • Results vary by wiki. Spanish Wikipedia saw the highest increase in false positive rates (35% increase [9.7% pre change to 13.1% post change]).

It is important to note that this ony considers users that explicitly selected a decline reason and not users that did not provide a reason for dismissing reference check.

New Content Edit Revert Rate

Proportion of all new content edits (defined by editcheck-newcontent tag) where reference check was shown that were reverted within 48 hours of being published.

Show the code
revert_rate_overall <- published_edit_check_data %>%
    filter(is_edit_check_activated == 1,
          is_new_content == 1) %>%
    group_by(pre_post) %>%
    summarise(n_edits = n_distinct(revision_id),
              n_reverts = n_distinct(revision_id[was_reverted > 0])) %>%
    mutate(revert_rate = paste0(round(n_reverts/n_edits * 100, 1), "%"))  %>%
    gt()  %>%
    tab_header(
    title = "New content edit revert rate"
      )  %>%
  cols_label(
    pre_post = "Pre or Post Change",
    n_edits = "Number of published edits",
    n_reverts = "Number of edits reverted",
    revert_rate = "Revert Rate"
  ) %>%
  tab_footnote(
    footnote = "Limited to edits where edit check was shown and that were reverted within 48 hours",
    locations = cells_column_labels(
      columns = 'revert_rate'
    )
  )%>%
    opt_stylize(style =2) 

display_html(as_raw_html(revert_rate_overall))
New content edit revert rate
Pre or Post Change Number of published edits Number of edits reverted Revert Rate1
pre 13749 2807 20.4%
post 11412 1965 17.2%
1 Limited to edits where edit check was shown and that were reverted within 48 hours
Show the code
# daily revert rates to look for sudden change
revert_rate_overall_daily <-published_edit_check_data %>%
    filter(is_edit_check_activated == 1,
          is_new_content == 1) %>%
    group_by(date,pre_post) %>%
    summarise(n_edits = n_distinct(revision_id),
              n_reverts = n_distinct(revision_id[was_reverted > 0])) %>%
    mutate(revert_rate = paste0(round(n_reverts/n_edits * 100, 1), "%"))
Show the code
# plot daily editing sessions
textaes <- data.frame(y = c(0.31),
                      x = as.Date(c('2024-12-16')),
                      lab = c("Reference check moves to side rail"))


p <- revert_rate_overall_daily %>%
        ggplot(aes(x = date, y = n_reverts/n_edits)) +
        geom_line(size = 1.5, color = '#0072B2') +
        geom_vline(xintercept = as.Date('2024-12-12'), linetype = 'dashed', size = 1) +
        geom_segment(aes(x = as.Date(c('2024-12-15')), y = 0.3, xend = as.Date('2024-12-12'), yend = 0.2),
                  arrow = arrow(length = unit(0.8, "cm")), size = 1, color = "black") +
        geom_text(mapping = aes(y = y, x = x, label = lab), 
            data = textaes, inherit.aes = FALSE, size = 5) +
        scale_y_continuous(labels = scales::percent, limit = c(0, 0.4)) +
        scale_x_date(date_labels = "%b-%d", date_breaks = "1 week", minor_breaks = NULL) +
        labs(title = "Daily revert rate for new content edits shown Reference Check",
           y = "Proportion of edits reverted") +
        theme_bw() +
        theme(
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),
            panel.background = element_blank(),
            plot.title = element_text(hjust = 0.5),
            text = element_text(size=18),
            legend.position="bottom",
            axis.text.x = element_text(hjust=1),
            axis.line = element_line(colour = "black"))
p
Warning message in geom_segment(aes(x = as.Date(c("2024-12-15")), y = 0.3, xend = as.Date("2024-12-12"), :
“All aesthetics have length 1, but the data has 29 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.”

By Editor Experience Group

Show the code
revert_rate_editorexp <- published_edit_check_data %>%
    filter(
          is_edit_check_activated == 1) %>%
    group_by(experience_level_group, pre_post) %>%
    summarise(n_edits = n_distinct(revision_id),
              n_reverts = n_distinct(revision_id[was_reverted > 0])) %>%
    mutate(revert_rate = paste0(round(n_reverts/n_edits * 100, 1), "%"))  %>%
    gt()  %>%
    tab_header(
    title = "New content edit revert rate by editor experience"
      )  %>%
  cols_label(
    experience_level_group = "Experience level group",
    pre_post = "Pre or Post Change",
    n_edits = "Number of published edits",
    n_reverts = "Number of edits reverted",
    revert_rate = "Revert Rate"
  ) %>%
  tab_footnote(
    footnote = "Limited to edits where reference check was shown and that were reverted within 48 hours",
    locations = cells_column_labels(
      columns = 'revert_rate'
    )
  )%>%
    opt_stylize(style =2) 


display_html(as_raw_html(revert_rate_editorexp))
New content edit revert rate by editor experience
Pre or Post Change Number of published edits Number of edits reverted Revert Rate1
Unregistered
pre 7803 2017 25.8%
post 6296 1455 23.1%
Newcomer
pre 1672 365 21.8%
post 1298 231 17.8%
Junior Contributor
pre 6755 751 11.1%
post 6030 563 9.3%
1 Limited to edits where reference check was shown and that were reverted within 48 hours

By Wiki

Show the code
revert_rate_wiki <- published_edit_check_data %>%
    filter(is_edit_check_activated == 1)  %>%
    group_by(wiki, pre_post) %>%
    summarise(n_edits = n_distinct(revision_id),
              n_reverts = n_distinct(revision_id[was_reverted > 0])) %>%
    mutate(revert_rate = paste0(round(n_reverts/n_edits * 100, 1), "%")) %>%
    filter(n_edits > 500) %>%
    select(-c(3,4)) %>%
    gt()  %>%
    tab_header(
    title = "New content edit revert rate by wiki"
      )  %>%
  cols_label(
    wiki = "Wiki",
    pre_post = "Pre or Post Change",
    revert_rate = "Revert Rate"
  ) %>%
  tab_footnote(
    footnote = "Limited to wikis with more than 500 published edits \n and where reference check was available as default during reviewed timeframe",
    locations = cells_column_labels(
      columns = 'revert_rate'
    )
  )%>%
    opt_stylize(style =2) 

display_html(as_raw_html(revert_rate_wiki))
New content edit revert rate by wiki
Pre or Post Change Revert Rate1
eswiki
pre 18.1%
post 20.1%
frwiki
pre 20.2%
post 18.5%
itwiki
pre 19.8%
post 19.2%
jawiki
pre 7%
post 6.6%
nlwiki
pre 31.4%
post 17.6%
ruwiki
pre 18.4%
post 18.5%
1 Limited to wikis with more than 500 published edits and where reference check was available as default during reviewed timeframe

Key Findings:

  • We observed the most significant change in revert rates pre and post the move of reference check to the side rail.
  • The revert rate of new content edits where reference check was presented decreased by 15.7% (20.4% pre to 17.2% post change).
  • We observed revert rate decreases across all experience level groups (unregistered, newcomer, and junior contributors) and across the majority of Wikipedias, except for Spanish Wikipedia which had a 2 percentage point increase in revert rate.

Total distinct users that included a reference after being shown reference check

Total number of distinct users (limited to registered as we don’t track distinct anons) that included a new reference with their new content edit after being shown reference check.

Overall

Show the code
num_users_change <- published_edit_check_data %>%
    filter(is_edit_check_activated == 1,
        user_status == 'registered',  #only track unique registered users
          is_new_content == 1) %>% 
    group_by(pre_post) %>%
    summarise(n_users = n_distinct(user_id[includes_policy_change ==1 & was_reverted == 0])) %>%
    gt()  %>%
    tab_header(
    title = "Number of registered users that included a new reference after being shown reference check"
      )  %>%
  cols_label(
    pre_post = "Pre or Post Change",
    n_users = "Number of registered users"
  ) %>%
  tab_footnote(
    footnote = "Limited to registered users with 100 or fewer edits",
    locations = cells_column_labels(
      columns = 'n_users'
    )
  )%>%
    opt_stylize(style =2)

display_html(as_raw_html(num_users_change))
Number of registered users that included a new reference after being shown reference check
Pre or Post Change Number of registered users1
pre 2245
post 2079
1 Limited to registered users with 100 or fewer edits
Show the code
# plot daily change in users
num_users_change_daily <- published_edit_check_data %>%
    filter(is_edit_check_activated == 1,
        user_status == 'registered',  #only track unique registered users
          is_new_content == 1) %>% 
    group_by(date, pre_post) %>%
    summarise(n_users = n_distinct(user_id[includes_policy_change==1 & was_reverted == 0]))
Show the code
# plot daily editing sessions
textaes <- data.frame(y = c(255),
                      x = as.Date(c('2024-12-16')),
                      lab = c("Reference check moves to side rail"))


p <- num_users_change_daily %>%
        ggplot(aes(x = date, y = n_users)) +
        geom_line(size = 1.5, color = '#0072B2') +
        geom_vline(xintercept = as.Date('2024-12-12'), linetype = 'dashed', size = 1) +
        geom_segment(aes(x = as.Date(c('2024-12-15')), y = 250, xend = as.Date('2024-12-12'), yend = 210),
                  arrow = arrow(length = unit(0.8, "cm")), size = 1, color = "black") +
        geom_text(mapping = aes(y = y, x = x, label = lab), 
            data = textaes, inherit.aes = FALSE, size = 5) +
        scale_y_continuous(limits = c(50, 300)) +
        scale_x_date(date_labels = "%b-%d", date_breaks = "1 week", minor_breaks = NULL) +
        labs(title = "Daily number of users that that included a reference after being shown reference check",
           y = "Number of users") +
        theme_bw() +
        theme(
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),
            panel.background = element_blank(),
            plot.title = element_text(hjust = 0.5),
            text = element_text(size=18),
            legend.position="bottom",
            axis.text.x = element_text(hjust=1),
            axis.line = element_line(colour = "black"))

p
Warning message in geom_segment(aes(x = as.Date(c("2024-12-15")), y = 250, xend = as.Date("2024-12-12"), :
“All aesthetics have length 1, but the data has 29 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.”

By Editor Experience

Show the code
num_users_change_userexp <- published_edit_check_data %>%
    filter(is_edit_check_activated == 1,
        user_status == 'registered',
          is_new_content == 1) %>% 
    group_by(experience_level_group, pre_post) %>%
    summarise(n_users = n_distinct(user_id[includes_policy_change ==1 & was_reverted == 0])) %>%
    gt()  %>%
    tab_header(
    title = "Number of users by editor experience \n that included a reference after being shown reference check"
      )  %>%
  cols_label(
    experience_level_group = "Editor experience",
    pre_post = "Pre or Post Change",
    n_users = "Number of registered users"
  ) %>%
    opt_stylize(style =2) 

display_html(as_raw_html(num_users_change_userexp  ))
Number of users by editor experience that included a reference after being shown reference check
Pre or Post Change Number of registered users
Newcomer
pre 597
post 569
Junior Contributor
pre 1714
post 1580

By Wiki

Show the code
num_users_change_wiki <- published_edit_check_data %>%
    filter(
        user_status == 'registered',  #only track unique registered users
          is_new_content == 1) %>% 
    group_by(wiki, pre_post) %>%
    summarise(n_users = n_distinct(user_id[includes_policy_change ==1 & was_reverted == 0])) %>%
    filter(n_users > 150) %>%
    gt()  %>%
    tab_header(
    title = "Number of users by wiki \n that included a reference after being shown reference check"
      )  %>%
  cols_label(
    wiki = "Wiki",
    pre_post = "Pre or Post Change",
    n_users = "Number of registered users"
  ) %>%
  tab_footnote(
    footnote = "Limited to wikis with more than 200 users \n and where edit check was available as default during reviewed timeframe",
    locations = cells_column_labels(
      columns = 'pre_post'
    )
  )%>%
    opt_stylize(style =2)

display_html(as_raw_html(num_users_change_wiki ))
Number of users by wiki that included a reference after being shown reference check
Pre or Post Change1 Number of registered users
eswiki
pre 407
post 292
frwiki
pre 578
post 537
ptwiki
pre 238
post 193
ruwiki
pre 230
post 206
1 Limited to wikis with more than 200 users and where edit check was available as default during reviewed timeframe

Key Findings:

  • Overall there was slight decrease in the absolute number of users that added a reference after being shown reference check (- 166 users across all wikis); however, there were no significant changes around the date edit check was moved to the side rail.
  • Additionally, the lower number of users is also likely to be impacted be seasonal trends and changes in editing activity around the December holidays. We will also review the proportion of edits that included a reference to help provide more insights into any changes in frequency of response to edit checks presented.

Proportion of edits that included a reference after being shown reference check

Proportion of all published new content edits where the reference check was shown and added a new reference.

Overall

Show the code
prop_w_change_overall <- published_edit_check_data %>%
    filter(is_edit_check_activated == 1,
           is_new_content == 1) %>%  #limit to only new content edits
    group_by(pre_post) %>%
    summarise(n_content_edits = n_distinct(revision_id),
              n_edits_w_change = n_distinct(revision_id[includes_policy_change ==1 & was_reverted == 0])) %>%
    mutate(activation_rate = paste0(round(n_edits_w_change/n_content_edits * 100, 1), "%"))  %>%
    gt() %>%
    tab_header(
    title = "Proportion of new content edits that included a new reference"
      )  %>%
  cols_label(
    pre_post = "Pre or Post Change",
    n_content_edits = "Number of new content edits",
    n_edits_w_change = "Number of new content edits with a reference",
    activation_rate = "Proportion of new content edits with a reference"
  ) %>%
  tab_footnote(
    footnote = "Limited to new content edits where reference check was shown",
    locations = cells_column_labels(
      columns = 'n_content_edits'
    )
  ) %>%
    opt_stylize(style =2)

display_html(as_raw_html(prop_w_change_overall))
Proportion of new content edits that included a new reference
Pre or Post Change Number of new content edits1 Number of new content edits with a reference Proportion of new content edits with a reference
pre 13749 4790 34.8%
post 11412 4327 37.9%
1 Limited to new content edits where reference check was shown

Editor Experience

Show the code
prop_w_change_userexp <- published_edit_check_data %>%
    filter(is_edit_check_activated == 1,
           is_new_content == 1) %>%  #limit to only new content edits
    group_by(experience_level_group, pre_post) %>%
    summarise(n_content_edits = n_distinct(revision_id),
              n_edits_w_change = n_distinct(revision_id[includes_policy_change ==1 & was_reverted == 0])) %>%
    mutate(activation_rate = paste0(round(n_edits_w_change/n_content_edits * 100, 1), "%"))  %>%
    gt() %>%
    tab_header(
    title = "Proportion of new content edits with a change to address policy violation by editor experience"
      )  %>%
  cols_label(
    experience_level_group = "Editor experience",
    pre_post = "Pre or Post Change",
    n_content_edits = "Number of new content edits",
    n_edits_w_change = "Number of new content edits that include a new reference",
      activation_rate = "Proportion of new content edits that include a new reference"
  ) %>%
  tab_footnote(
    footnote = "Limited to new content edits where edit check was shown",
    locations = cells_column_labels(
      columns = 'n_content_edits'
    )
  )%>%
    opt_stylize(style = 2) 

display_html(as_raw_html(prop_w_change_userexp))

Proportion of new content edits with a change to address policy violation by editor experience
Pre or Post Change Number of new content edits1 Number of new content edits that include a new reference Proportion of new content edits that include a new reference
Unregistered
pre 6906 1898 27.5%
post 5551 1645 29.6%
Newcomer
pre 1645 597 36.3%
post 1278 569 44.5%
Junior Contributor
pre 5198 2295 44.2%
post 4583 2113 46.1%
1 Limited to new content edits where edit check was shown

By Wiki

Show the code
prop_w_change_wiki <- published_edit_check_data%>%
    filter(is_edit_check_activated == 1,
           is_new_content == 1) %>%  #limit to only new content edits
    group_by(wiki, pre_post) %>%
    summarise(n_content_edits = n_distinct(revision_id),
              n_edits_w_change = n_distinct(revision_id[includes_policy_change ==1  & was_reverted == 0])) %>%
    mutate(activation_rate = paste0(round(n_edits_w_change/n_content_edits * 100, 1), "%")) %>%
    filter(n_content_edits > 500)  %>%
    gt() %>%
    tab_header(
    title = "Proportion of new content edits with a new reference"
      )  %>%
  cols_label(
    wiki = "Wiki",
    pre_post = "Pre or Post Change",
    n_content_edits = "Number of new content edits",
    n_edits_w_change = "Number of new content edits with a new reference",
      activation_rate = "Proportion of new conent edits with a new reference"
  ) %>%
  tab_footnote(
    footnote = "Limited wikis with over 500 published edits and to new content edits where edit check was shown",
    locations = cells_column_labels(
      columns = 'n_content_edits'
    )
  )%>%
    opt_stylize(style =2) 

display_html(as_raw_html(prop_w_change_wiki ))
Proportion of new content edits with a new reference
Pre or Post Change Number of new content edits1 Number of new content edits with a new reference Proportion of new conent edits with a new reference
eswiki
pre 1049 418 39.8%
post 696 264 37.9%
frwiki
pre 2068 749 36.2%
post 1783 710 39.8%
itwiki
pre 1139 310 27.2%
post 956 291 30.4%
nlwiki
pre 698 200 28.7%
post 544 195 35.8%
ruwiki
pre 1408 476 33.8%
post 1211 406 33.5%
1 Limited wikis with over 500 published edits and to new content edits where edit check was shown

Key Findings:

  • There was a 8% increase in the proportion of new content edits that included a reference following the change (34.8% pre change to 37.9% post change).
  • Increases were observed across all editor experience levels and most wikis.

Reuse