Missing data collection for interaction/engagement data: The collection notebook (collect_enwiki_refcheck_leading_indicators.ipynb) does not have queries for: * Engagement/interaction data (what becomes reference_check_engagement_data.tsv) - optional * Blocked users data (reference_check_eligible_users_blocked.csv) Action needed: Add queries in the collection notebook.
Overview
The Editing team is evaluating the impact of Reference Check through an A/B test. The A/B test has one key performance indicator with two parts, two optional curiosities to explore if time allows, and four guardrails, the last of which contains two sub-parts.
KPI Hypothesis: The amount of constructive edits newcomers will publish will increase because a greater percentage of edits that add new content will include a reference or an explicit acknowledgement as to why these edits lack references. KPI Metrics(s) for evaluation: 1) Proportion of published edits that add new content and include a reference or explicit acknowledgement of why a citation was not added; 2) Proportion of published edits that add new content (T333714) and are constructive (read: NOT reverted within 48 hours).
From the Edit Check Reference-Check AB Test report when Reference Check was shown, edits were 2.2× more likely to include a new reference and be constructive (i.e. not reverted within 48 hours) than otherwise. The English Wikipedia Reference Check A/B test will be looking to how numbers compare to this 2024 finding.
Here we review indicators, prior to proceeding to analyzing the full A/B test data.
Leading indicators:
⭐ Newcomers are not encountering Reference Check: Proportion of new content edits Reference Check is shown within (Frequency) ⭐ Newcomers are not understanding the feature: Proportion of contributors that are presented Reference Check and abandon their edits (Edit Completion Rate) ⭐ Reference Check is causing disruption: Proportion of published edits that add new content and are reverted within 48hours (Revert Rate) * Reference Check is causing disruption: (If time) Proportion of people blocked after publishing an edit where Reference Check was shown * People deem Reference Check irrelevant: Proportion of edits wherein people elect NOT to cite the text they are attempting to add (Dismissal Rate)
Methodology
In this AB test, users in the test group will be shown Reference Check if attempting an edit that meets the requirements for the check to be shown in VisualEditor. The control group is provided the default editing experience where no Reference Check is shown.
We collected two weeks of AB test events logged between 7 November 2025 and 21 November 2025 on English Wikipedia.
We relied on events logged in EditAttemptStep, VisualEditorFeatureUse, and change tags recorded in the revision tags table.
Published edits eligible for Reference Check are identified by the editcheck-references revison tag.
For filtering to new content edits we use editcheck-newcontent.
To identify edits where Reference Check was shown we use VisualEditorFeatureUse events: event.feature = editCheck-addReferenceevent.action = check-shown-presave
action-reject: editor dismissed Reference Check
edit-check-feedback-reason-*: Reason for dismissal
For calculating Edit Completion Rate we make an assumption and posit that all edits reaching saveIntent are eligible.
For calculating Revert Rate, published edits eligible for Reference Check are identified by the editcheck-references revision tag See the instrumentation spec for more details.
Data was limited to mobile web and desktop edits completed on a main page namespace using VisualEditor on English Wikipedia. We also limited to edits completed by unregistered users and users with 100 or fewer edits as those are the users that would be shown Reference Check under the default config settings.
For each leading indicator metric, we reviewed the following dimensions: by experiment group (test and control), by platform (mobile web or desktop), by user experience and status. We also reviewed some indicators such as edit completion rate by the number of checks shown within a single editing session.
Note: For the by user experience analysis, we split newer editors into three experience level groups: (1) unregistered, (2) newcomer (registered user making their first edit on Wikipedia), and (3) Junior Contributor (user that has made between 2 and 100 edits).
Method Note
Results are based on initial AB test data to check if any adjustments to the feature need to be prioritized. More event data will be needed to confirm statistical significance of these findings. We will review the complete AB test data as part of the analysis for T400101
Early indicators suggest Reference Check appears fairly often, may slightly lower edit completion rates, and is associated with early reductions in revert rates across all user types.
Brief Summary
Early indicators suggest Reference Check is shown fairly often—more than Paste Check or Tone Check, but less frequently than earlier multi-check estimates—and it fires more on mobile web than desktop. While the check may slightly lower edit completion rates (especially when many checks appear), it is also associated with reduced revert rates across all user groups. Effects vary by experience level: newcomers see more checks, and although completion rates rise for unregistered and newcomer editors, they fall for junior contributors. (93.8% → 87.8%, a 6.4% relative decrease).
Full Summary
Reference Check Frequency
Reference Check was shown at least once in 42.4% of all published new-content edits by newer editors in the test group. This is higher than observed in Paste Check (36%), higher than Paste Check’s initial estimates in T403861 for published edits, and much higher than Tone Check in the Leading Indicators Analysis (9%). This frequency is lower than trends observed in the Multi-Check Indicators Analysis, where Reference Check was presented in about 78% of published new-content edits.
By platform: A notably higher proportion of mobile web edits were shown Reference Check (76.3%) compared to desktop (38.2%). This contrasts with the Paste Check Leading Indicators report patterns, where desktop edits were more frequently shown Paste Check (39%) than mobile web edits (24%).
By whether multiple checks were shown: 12.3% of all published new-content edits were shown more than one Reference Check in a session. This is slightly higher than Paste Check (11.4%) and lower than the 2025 Multiple Reference Check A/B test where 27% of published new-content edits in which Reference Check activated displayed multiple Reference Checks.
By number of checks shown: The majority of sessions shown Reference Check saw only one check (71%). Two checks were shown in 12.3% of sessions; fewer than 4% received more than six checks. For comparison, out of all editing sessions shown Paste Check, in the Paste Check Leading Indicators Report analysis, 68% included only one Paste Check shown.
By user experience: Reference Check appears slightly more frequently for newcomers: Newcomer new content edits are 2.5% more likely to be shown Reference Check relative to unregistered users, and 38.8% more likely relative to junior contributors. In the 2025 Multiple Reference Check A/B test we observed a noticeably stronger effect for newcomers than for unregistered users however junior contributors in the treatment group showed the highest percentages overall as far as adding references more often when exposed to multiple reference checks.
Edit Completion Rate
Edits shown Reference Check are completed at a lower rate (87.1%) than eligible edits not shown Reference Check (90.6%), a 4% relative decrease. This aligns with the 2024 Reference Check A/B test, where showing Reference Check produced a 10% decrease in edit completion rate relative to control.
By platform: Completion decreased modestly on both platforms. On mobile web there was a 1.5% relative decrease for the treatment group (79.7%) compared to the control (80.9%). Desktop saw a 5.9% relative decrease for the treatment group (89.2%) compared to the control (94.8%). In the 2024 Reference Check A/B test, the pattern was more dramatic on mobile (–24.3%) than desktop (–3.1%). Early trends here look milder by comparison but consistent in direction.
By whether multiple checks were shown: Sessions with multiple Reference Checks show lower completion rates than single-check sessions. This is the opposite of that seen in the Paste Check Leading Indicators Analysis and Tone Check Leading Indicators Analysis, where presenting multiple checks did not reduce completion rates. For reference, the 2025 Multiple Reference Check A/B test showed that the edit completion rate for users presented multiple reference checks was lower (74%) than that for users presented a single reference check (75%).
By number of checks shown: We don’t see a significant increase in edit abandonment rate when 2–5 Reference Checks are presented in a single session. However, the number declines at 6–10 and falls substantially over 10. In the Paste Check Leading Indicators Analysis, we did not observe a notable increase in edit abandonment even when many (>3) Paste Checks were shown within a session.
By user experience: Edit completion rates increased for unregistered editors (control: 80.8%, treatment: 84.7%) and for newcomers (control: 84.3%, treatment: 87.1%). Junior contributors in the treatment group (87.8%) saw a 6.4% decrease relative to their control group (93.8%) counterparts. This variation across user experiences echoes differences seen in the 2024 Multi Check Leading Indicators Analysis, where unregistered editors were largely unaffected, newcomers showed small declines when multiple checks were presented, and junior contributors in the treatment group showed a slight relative increase (3%) compared to the control.
Revert Rate
Published new-content edits shown Reference Check are reverted less frequently, with a 13.7% relative decrease compared to eligible edits not shown the check (29.3% for the control and 25.3% for the treatment). This is steeper than the 8.6% relative decrease observed in the 2024 Reference Check A/B test and higher than revert rates in the 2025 multiple Edit Checks A/B test, where control and treatment estimates were around 22.5–23.6%. Important note: These revert rates include edits where the final published text may not include a reference. We plan to review the proportion of new content edits shown or eligible to be shown Reference Check that include a reference in the AB test analysis.
By platform: Both desktop (24.4%) and mobile web (29.1%) treatment groups show improved revert rates relative to their controls (desktop: 25.3%, mobile web: 45.8%). Mobile web editors in the treatment group saw a 36.5% relative decrease compared to the control group while desktop editors saw a 3.6% relative decrease. This is consistent with earlier findings: In the 2024 Reference Check A/B test, relative revert rates decreased on both platforms (desktop –9.4%, mobile –5.9%) and in the 2025 Multiple Reference Check A/B test, mobile web treatment group edits also tended to show higher revert rates than desktop treatment group edits.
By whether multiple checks were shown: Edits shown multiple Reference Checks experienced higher revert rates than those shown only one, increasing from 24.4% for single-check sessions to 27.6% for multiple-check sessions. This same pattern was visible in the Tone Check Leading Indicators Analysis (multiple tone checks → higher revert rates). In contrast, in the 2025 Multiple Reference Check A/B test, the multi-check treatment group had a substantially lower revert rate (–34.7%) relative to single-check edits—likely because edits that require multiple references differ systematically from those that only trigger one.
By user experience: Revert rates decreased across all user experience types. Unregistered editors saw a decrease from 36.8% in the control to 32.2% in the treatment, newcomers from 42.3% to 39.6%, and junior contributors from 25.1% to 20.4%. This pattern is consistent with the 2024 Reference Check AB Test and in part with the 2025 Multiple Reference Checks A/B test analyses where revert rates decreased for newcomers and slightly increased for junior contributors and unregistered editors when comparing treatment to control groups. The 2025 Multiple Reference Checks A/B test highlights, “Results vary slightly based on the type of user completing the edit but none of the observed changes were statistically significant.”
Question: Are newer editors encountering Reference Check?
Methodology: We reviewed the proportion of published new content edits where at least one Reference Check was shown during the editing session (event.feature = 'editCheck-addReference' AND event.action IN ('check-shown-presave').
Published edits eligible for Reference Check are identified by the editcheck-references revison tag
This analysis was specifically limited to edits that were successfully published and identified as new content edits with the tag editcheck-newcontent.
Code
#load frequency datareference_check_frequency_data <-read.csv(file ='data/1-reference_check_save_data.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
Code
# Cleaning up dataset and renaming fields to clarify meanings# Set experience level group and factor levelsreference_check_frequency_data <- reference_check_frequency_data %>%mutate(experience_level_group =case_when( user_edit_count ==0& user_status =='registered'~'Newcomer', user_edit_count ==0& user_status =='unregistered'~'Unregistered', user_edit_count >0& user_edit_count <=100~"Junior Contributor", user_edit_count >100~"Non-Junior Contributor" ),experience_level_group =factor(experience_level_group,levels =c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor") )) #rename experiment field to clarifyreference_check_frequency_data <- reference_check_frequency_data %>%mutate(test_group =factor(test_group,levels =c('2025-09-editcheck-addReference-control', '2025-09-editcheck-addReference-test'),labels =c("control (no Reference Check)", "test (reference check available)")))#rename platform from phone to mobile web to clarify meaningreference_check_frequency_data <- reference_check_frequency_data %>%mutate(platform =factor(platform,levels =c('phone', 'desktop'),labels =c("mobile web", "desktop")))
Code
#Set fields and factor levels to assess number of checks shownreference_check_frequency_data <- reference_check_frequency_data %>%mutate(multiple_checks_shown =ifelse(n_checks_shown >1, 1, 0), multiple_checks_shown =factor( multiple_checks_shown ,levels =c(0,1)))# note these buckets can be adjusted as needed based on distribution of datareference_check_frequency_data <- reference_check_frequency_data %>%mutate(checks_shown_bucket =case_when(is.na(n_checks_shown) ~'0', n_checks_shown ==1~'1', n_checks_shown ==2~'2', n_checks_shown >2& n_checks_shown <=5~"3-5", n_checks_shown >5& n_checks_shown <=10~"6-10", n_checks_shown >10~"over 10" ),checks_shown_bucket =factor(checks_shown_bucket ,levels =c("0","1","2", "3-5", "6-10", "over 10") ))
Overall
Code
reference_checks_shown_saved_overall <- reference_check_frequency_data %>%filter(test_group =="test (reference check available)"#limit to test group edits& is_new_content ==1 ) %>%#limit to published new content editsgroup_by(test_group) %>%summarise(n_editing_session =n_distinct(editing_session),n_editing_session_refcheck =n_distinct(editing_session[was_reference_check_shown ==1])) %>%mutate(prop_check_shown =paste0(round(n_editing_session_refcheck/n_editing_session *100, 1), "%")) %>%gt() %>%tab_header(title ="Published new content edits shown at least one Reference Check" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",n_editing_session ="Number of edits",n_editing_session_refcheck ="Number of edits shown Reference Check", prop_check_shown ="Proportion of edits shown Reference Check" ) %>%tab_source_note( gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(reference_checks_shown_saved_overall))
Published new content edits shown at least one Reference Check
Experiment Group
Number of edits
Number of edits shown Reference Check
Proportion of edits shown Reference Check
test (reference check available)
1827
774
42.4%
Limited to published new content edits by unregistered users and users with 100 or fewer edits
Reference Check was shown at least once in 42.4% of all published new content edits by newer editors in the test group. This is higher than rates observed for Paste Check (36%) in the Paste Check Leading Indicators Analysis report, higher than Paste Check’s initial estimates in T403861, and substantially higher than rates observed in the Tone Check Leading Indicators Analysis Report where Tone Check was been shown in 9% of all published new content edits.
In the Multi Check Indicator’s Analysis, we observed that Reference Check was shown in nearly 80% of published new content edits.
By whether multiple checks were shown
Code
reference_checks_shown_saved_bymultiple <- reference_check_frequency_data %>%filter(test_group =="test (reference check available)"&#limit to test group edits is_new_content ==1) %>%#limit to published new content editsgroup_by(test_group) %>%summarise(n_editing_session =n_distinct(editing_session),n_editing_session_multicheck =n_distinct(editing_session[was_reference_check_shown ==1& multiple_checks_shown ==1])) %>%mutate(prop_check_shown =paste0(round(n_editing_session_multicheck/n_editing_session *100, 1), "%")) %>%gt() %>%tab_header(title ="Published new content edits shown multiple Reference Checks" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",n_editing_session ="Number of edits",n_editing_session_multicheck ="Number of edits shown multiple Reference Checks", prop_check_shown ="Proportion of edits shown multiple Reference Checks" ) %>%tab_source_note( gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(reference_checks_shown_saved_bymultiple))
Published new content edits shown multiple Reference Checks
Experiment Group
Number of edits
Number of edits shown multiple Reference Checks
Proportion of edits shown multiple Reference Checks
test (reference check available)
1827
225
12.3%
Limited to published new content edits by unregistered users and users with 100 or fewer edits
12.3% of all published new content edits were shown more than one Reference Check in a session. This is slightly higher than that observed in the Paste Check Leading Indicators Analysis report (11.4%) and lower than the 2025 Multiple Reference Check A/B test, within the test group, 27% of published new-content edits where Reference Check was activated (1,697 edits) displayed multiple reference checks within a single editing session.
By number of checks shown
Code
reference_checks_shown_saved_bynchecks <- reference_check_frequency_data %>%filter(test_group =="test (reference check available)"&#limit to test group edits is_new_content ==1& was_reference_check_shown ==1) %>%#limit to published new content editsmutate(total_sessions =n_distinct(editing_session)) %>%group_by(total_sessions, checks_shown_bucket) %>%summarise(n_editing_session_refcheck =n_distinct(editing_session)) %>%mutate(prop_check_shown =paste0(round(n_editing_session_refcheck/total_sessions *100, 2), "%")) %>%ungroup() %>%select(-c(1,3)) %>%#mutate(n_editing_session_refcheck = ifelse(n_editing_session_refcheck < 50, "<50", n_editing_session_refcheck)) %>% #sanitizing per data publication guidelinesgt() %>%tab_header(title ="Published new content edits by number of Reference Checks shown" ) %>%opt_stylize(5) %>%cols_label(checks_shown_bucket ="Number of Reference Checks shown",#n_editing_session_refcheck = "Number of edits", prop_check_shown ="Proportion of edits" ) %>%tab_source_note( gt::md('Limited to published new content edits shown at least one Reference Check') )display_html(as_raw_html(reference_checks_shown_saved_bynchecks))
Published new content edits by number of Reference Checks shown
Number of Reference Checks shown
Proportion of edits
1
70.93%
2
12.27%
3-5
12.14%
6-10
2.84%
over 10
1.81%
Limited to published new content edits shown at least one Reference Check
Reference Check was shown only once in the majority of all editing sessions shown Reference Check (71%). For sessions that display multiple Reference Checks, the largest share shows two checks (12.3%), while fewer than 4% present more than six.
reference_checks_shown_byplatform <- reference_check_frequency_data %>%filter(test_group =="test (reference check available)"&#limit to test group edits is_new_content ==1) %>%#limit to published new content editsgroup_by(platform) %>%summarise(n_editing_session =n_distinct(editing_session),#n_editing_session_refcheck = n_distinct(editing_session[was_reference_check_shown == 1])) %>%#mutate(prop_check_shown = paste0(round(n_editing_session_refcheck/n_editing_session * 100, 1), "%")) %>%#mutate(n_editing_session_refcheck = ifelse(n_editing_session_refcheck < 50, "<50", n_editing_session_refcheck))%>% #sanitizing per data publication guidelinen_editing_session_refcheck =n_distinct(editing_session[was_reference_check_shown ==1])) %>%mutate(prop_check_shown =paste0(round(n_editing_session_refcheck/n_editing_session *100, 1), "%")) %>%mutate(n_editing_session_refcheck =ifelse(n_editing_session_refcheck <50, "<50", n_editing_session_refcheck))%>%#sanitizing per data publication guideline#select(-2) %>% #removing total number of edits column to santize data for publicationgt() %>%tab_header(title ="Published new content edits shown Reference Check by platform" ) %>%opt_stylize(5) %>%cols_label(platform ="Platform",#n_editing_session = "Number of edits",n_editing_session_refcheck ="Number of edits shown Reference Check", prop_check_shown ="Proportion of edits shown Reference Check" ) %>%tab_source_note( gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(reference_checks_shown_byplatform))
Published new content edits shown Reference Check by platform
Platform
n_editing_session
Number of edits shown Reference Check
Proportion of edits shown Reference Check
mobile web
198
151
76.3%
desktop
1629
623
38.2%
Limited to published new content edits by unregistered users and users with 100 or fewer edits
A higher proportion of edits on mobile web are shown Reference Check (76.3%) compared to desktop (38.2%), representing a 99.7% positive difference when comparing mobile web to desktop.
reference_checks_shown_byuser_status <- reference_check_frequency_data %>%filter(test_group =="test (reference check available)"&#limit to test group edits is_new_content ==1) %>%#limit to published new content editsgroup_by(experience_level_group ) %>%summarise(n_editing_session =n_distinct(editing_session),#n_editing_session_pastecheck = n_distinct(editing_session[was_reference_check_shown == 1])) %>%#mutate(prop_check_shown = paste0(round(n_editing_session_pastecheck/n_editing_session * 100, 1), "%")) %>%n_editing_session_refcheck =n_distinct(editing_session[was_reference_check_shown ==1])) %>%mutate(prop_check_shown =paste0(round(n_editing_session_refcheck/n_editing_session *100, 1), "%")) %>%select(-2) %>%#removing total number of edits column to santize data for publicationgt() %>%tab_header(title ="Published new content edits shown Reference Check by user experience" ) %>%opt_stylize(5) %>%cols_label(experience_level_group ="User Experience",#n_editing_session = "Number of edits",n_editing_session_refcheck ="Number of edits shown Reference Check", prop_check_shown ="Proportion of edits shown Reference Check" ) %>%tab_source_note( gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(reference_checks_shown_byuser_status ))
Published new content edits shown Reference Check by user experience
User Experience
Number of edits shown Reference Check
Proportion of edits shown Reference Check
Unregistered
149
59.1%
Newcomer
106
60.6%
Junior Contributor
519
37.1%
Limited to published new content edits by unregistered users and users with 100 or fewer edits
Reference Check appears slightly more frequently for newcomers.
Newcomer new content edits are 2.5% more likely to be shown Reference Check relative to unregistered users, and 38.8% more likely relative to junior contributors.
In the 2025 Multiple Reference Check A/B test we observed a noticeably stronger effect for newcomers than for unregistered users however junior contributors in the treatment group showed the highest percentages overall as far as adding references more often when exposed to multiple reference checks.
B) Reference Check Edit Completion Rate
Question: Do newer editors understand the feature?
Methodology: We reviewed the proportion of edits where Reference Check was shown at least once during the edit session and that were successfully published (event.action = saveSuccess). These edits were compared to the completion rate of edits in the control group that were eligible but not shown Reference Check, as implemented in T402460.
The edit_completion_rate query filters to saveIntent events, per the comment: “the moment when reference check would be shown if eligible and in test group”.
Per the Tone Check methodology and Paste Check methodology, we compare: Test group: edits where Reference Check was shown Control group: edits that were eligible but not shown.
Note: This analysis excludes edits that were abandoned prior to reaching the point where Reference Check was or would have been shown.
Code
# load data for assessing edit completion rateedit_completion_rates <-read.csv(file ='data/2-edit_completion_rate.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
Code
# Set experience level group and factor levelsedit_completion_rates <- edit_completion_rates %>%mutate(experience_level_group =case_when( user_edit_count ==0& user_status =='registered'~'Newcomer', user_edit_count ==0& user_status =='unregistered'~'Unregistered', user_edit_count >0& user_edit_count <=100~"Junior Contributor", user_edit_count >100~"Non-Junior Contributor" ),experience_level_group =factor(experience_level_group,levels =c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor") )) #rename experiment field to clarfiyedit_completion_rates <- edit_completion_rates %>%mutate(test_group =factor(test_group,#levels = c('2025-09-editcheck-paste-control', '2025-09-editcheck-paste-test'),#levels = c('2025-09-editcheck-addReference-control', '2025-09-editcheck-addReference-test'),#labels = c("control (not shown Reference Check)", "test (shown Reference Check)")))levels =c('2025-09-editcheck-addReference-control', '2025-09-editcheck-addReference-test'),labels =c("control (not shown Reference Check)", "test (reference check available)")))#rename platform from phone to mobile web to clarify meaningedit_completion_rates <- edit_completion_rates %>%mutate(platform =factor(platform,levels =c('phone', 'desktop'),labels =c("mobile web", "desktop")))
Code
#Set fields and factor levels to assess number of checks shownedit_completion_rates <- edit_completion_rates %>%mutate(multiple_checks_shown =ifelse(n_checks_shown >1, "multiple checks shown", "one check shown"), multiple_checks_shown =factor( multiple_checks_shown ,levels =c("one check shown", "multiple checks shown")))# note these buckets can be adjusted as needed based on distribution of dataedit_completion_rates <- edit_completion_rates %>%mutate(checks_shown_bucket =case_when(is.na(n_checks_shown) ~'0', n_checks_shown ==1~'1', n_checks_shown ==2~'2', n_checks_shown >2& n_checks_shown <=5~"3-5", n_checks_shown >5& n_checks_shown <=10~"6-10", n_checks_shown >10~"over 10" ),checks_shown_bucket =factor(checks_shown_bucket ,levels =c("0","1","2", "3-5", "6-10","over 10") ))
Code
# define set of all eligible edits to review (eligible in control and shown in test)edit_completion_rates <- edit_completion_rates %>%mutate(is_test_eligible =ifelse( (test_group =="test (reference check available)"& reference_check_shown ==1) | (test_group =="control (not shown Reference Check)"), 'eligible', 'not eligible'), #use different labelsis_test_eligible =factor( is_test_eligible,levels =c("eligible", "not eligible" ) ))
Overall
Code
edit_completion_rate_overall <- edit_completion_rates %>%#filter(reference_check_shown == 1 ) %>% #limit to sessions where reference check was shownfilter(is_test_eligible =='eligible') %>%#limit to eligible editsgroup_by(test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%"))
Code
# plot visualization of overall edit completion ratesdodge <-position_dodge(width=0.9)p <- edit_completion_rate_overall %>%ggplot(aes(x= test_group, y = n_saves/n_edits)) +geom_col(position ='dodge', fill ='dodgerblue4') +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(completion_rate), fontface=2), vjust=1.2, size =10, color ="white") +scale_fill_manual(values= cbPalette, name ="Reason") +labs (y ="Percent of edit attempts completed",x ="Experiment Group",title ="Reference Check edit completion rate",caption ="Limited to edit attempts shown or eligible to be shown at least one Reference Check") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=24),legend.position="none",axis.line =element_line(colour ="black")) p
Edits shown Reference Check are completed at a lower rate (87.1%) than edits in the control group that are eligible but not shown Reference Check (90.6%). This represents a 4% relative decrease and aligns with that observed in the 2024 Reference Check A/B test where overall, there was a 10% decrease in edit completion rate for edits where reference check was shown compared to the control group.
By whether multiple checks were shown
Code
edit_completion_rate_bymulti <- edit_completion_rates %>%filter(reference_check_shown ==1& test_group =='test (reference check available)') %>%group_by(test_group, multiple_checks_shown) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Reference Check edit completion rate by if multiple checks were shown" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment group",multiple_checks_shown ="Multiple Reference Checks shown",n_edits ="Number of edit attempts shown Reference Check",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown or eligible to be shown at least one Reference Check') )display_html(as_raw_html(edit_completion_rate_bymulti))
Reference Check edit completion rate by if multiple checks were shown
Multiple Reference Checks shown
Number of edit attempts shown Reference Check
Number of published edits
Proportion of edits saved
test (reference check available)
one check shown
629
560
89%
multiple checks shown
280
232
82.9%
Limited to edit attempts shown or eligible to be shown at least one Reference Check
The edit completion rate of edits shown multiple Reference Checks is lower than edits shown a single Reference Check. For comparison, Paste Check Leading Indicators and Tone Check Leading Indicators showed the apposite. Per the Paste Check Leading Indicators Analysis, “We currently don’t see any increase in edit abandonment rate even if a large number (>3) Paste Checks are shown in a single session”. Additionally, those edits shown one and two Paste Checks showed similar proportions of edits saved: 50.1% for those shown one and 53.4% for those shown two. In the Tone Check Leading Indicators analysis, edit completion was 66.5% for those edits shown one check and 66.9% for those shown multiple checks.
In the 2025 Multiple Reference Check A/B test the edit completion rate for users presented multiple reference checks was lower (74%) than that for users presented a single reference check (75%).
By number of checks shown
Code
edit_completion_rate_bynchecks <- edit_completion_rates %>%filter(reference_check_shown ==1& test_group =='test (reference check available)') %>%#limit to reference checks shown and test group group_by(test_group, checks_shown_bucket) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%ungroup()%>%mutate(n_edits =ifelse(n_edits <50, "<50", n_edits),n_saves =ifelse(n_saves <50, "<50", n_saves)) %>%#sanitizing per data publication guidelinesgroup_by(test_group) %>%gt() %>%tab_header(title ="Reference Check edit completion rate by the number of checks shown" ) %>%opt_stylize(5) %>%cols_label(checks_shown_bucket ="Number of Reference Checks shown",n_edits ="Number of edit attempts shown Reference Check",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown or eligible to be shown at least one Reference Check') )display_html(as_raw_html(edit_completion_rate_bynchecks))
Reference Check edit completion rate by the number of checks shown
Number of Reference Checks shown
Number of edit attempts shown Reference Check
Number of published edits
Proportion of edits saved
test (reference check available)
1
629
560
89%
2
111
98
88.3%
3-5
107
97
90.7%
6-10
<50
<50
76.7%
over 10
<50
<50
43.8%
Limited to edit attempts shown or eligible to be shown at least one Reference Check
We don’t see a significant increase in edit abandonment rate when 2-5 Reference Checks are presented in a single session. However, the number declines at 6-10 and falls substantially over 10. In the Paste Check Leading Indicators Analysis report we saw no increase in the edit-abandonment rate even when more than three Paste Checks were shown in a single session.
By platform
Code
edit_completion_rate_byplatform <- edit_completion_rates %>%#filter(reference_check_shown == 1) %>%filter(is_test_eligible =='eligible') %>%#limit to eligible editsgroup_by(platform, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%#mutate(n_saves = ifelse(n_saves < 50, "<50", n_saves))%>% #sanitizing per data publication guidelineselect(-c(3,4)) %>%gt() %>%tab_header(title ="Reference Check edit completion rate by platform" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",platform ="Platform",#n_edits = "Number of edit attempts shown tone check",#n_saves = "Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown or eligible to be shown at least one Reference Check') )display_html(as_raw_html(edit_completion_rate_byplatform))
Reference Check edit completion rate by platform
Experiment Group
Proportion of edits saved
mobile web
control (not shown Reference Check)
80.9%
test (reference check available)
79.7%
desktop
control (not shown Reference Check)
94.8%
test (reference check available)
89.2%
Limited to edit attempts shown or eligible to be shown at least one Reference Check
We observed slight decreases in edit completion by both platform types. There was a 1.5% relative decrease for mobile web edits in the treatment group compared to the control and 5.9% relative decrease on desktop.
In the 2024 Reference Check A/B test we observed a significant decrease on mobile compared to desktop. On mobile, edit completion rate decreased by 24.3% (13.5pp) while on desktop it decreased by only 3.1% (2.3pp).
By user experience
Code
edit_completion_rate_byuserstatus <- edit_completion_rates %>%#filter(reference_check_shown == 1) %>%filter(is_test_eligible =='eligible') %>%#limit to eligible editsgroup_by(experience_level_group, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%#select(-c(3,4)) %>% #data sanitizing for publicationgt() %>%tab_header(title ="Reference Check edit completion rate by editor experience" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",experience_level_group ="User Experience",n_edits ="Number of edit attempts shown Reference check",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown or eligible to be shown at least one Reference Check') )display_html(as_raw_html(edit_completion_rate_byuserstatus))
Reference Check edit completion rate by editor experience
Test Group
Number of edit attempts shown Reference check
Number of published edits
Proportion of edits saved
Unregistered
control (not shown Reference Check)
6753
5458
80.8%
test (reference check available)
177
150
84.7%
Newcomer
control (not shown Reference Check)
3648
3074
84.3%
test (reference check available)
124
108
87.1%
Junior Contributor
control (not shown Reference Check)
27777
26043
93.8%
test (reference check available)
608
534
87.8%
Limited to edit attempts shown or eligible to be shown at least one Reference Check
Edit completion rate increased for unregistered and newcomers in the treatment group, compared to the control.
Junior contributors in the treatment group saw a 6.4% relative decrease compared to those in the control. Differing completion rates by experience were also observed in the 2024 Multi Check Leading Indicators Analysis where unregistered editors saw no notable difference, newcomers in the treatment group receiving multiple checks saw a slight decline and junior contributors in the treatment group saw a 3% relative increase compared to the control.
C) Reference Check Revert Rate
Question: Is Reference Check causing any disruption?
Methdology: Reviewed the proportion of all published new content edits where Reference Check was shown at least once in an editing session and were reverted within 48 hours. This was compared to the revert rate of edits in the control group identifed as eligible but not shown Reference Check.
Code
# load data for assessing tone check published dataedit_check_save_data <-read.csv(file ='data/1-reference_check_save_data.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
Code
# Set experience level group and factor levelsedit_check_save_data <- edit_check_save_data %>%mutate(experience_level_group =case_when( user_edit_count ==0& user_status =='registered'~'Newcomer', user_edit_count ==0& user_status =='unregistered'~'Unregistered', user_edit_count >0& user_edit_count <=100~"Junior Contributor", user_edit_count >100~"Non-Junior Contributor" ),experience_level_group =factor(experience_level_group,levels =c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor") )) #rename experiment field to clarifyedit_check_save_data <- edit_check_save_data %>%mutate(test_group =factor(test_group,#levels = c('2025-09-editcheck-paste-control', '2025-09-editcheck-paste-test'),levels =c('2025-09-editcheck-addReference-control', '2025-09-editcheck-addReference-test'),labels =c("control (Reference Check not shown)", "test (Reference Check shown)")))#rename platform from phone to mobile web to clarify meaningedit_check_save_data <- edit_check_save_data %>%mutate(platform =factor(platform,levels =c('phone', 'desktop'),labels =c("mobile web", "desktop")))
Code
# set field to indicate if more than one check was shown in a single session. Note: This should only be applicable to the test group edit_check_save_data <- edit_check_save_data %>%mutate(multiple_checks_shown =ifelse(n_checks_shown >1, "multiple checks shown", "single check shown"), multiple_checks_shown =factor( multiple_checks_shown ,levels =c("single check shown", "multiple checks shown")))# note these buckets can be adjusted as needed based on distribution of dataedit_check_save_data <- edit_check_save_data %>%mutate(checks_shown_bucket =case_when(is.na(n_checks_shown) ~'0', n_checks_shown ==1~'1', n_checks_shown ==2~'2', n_checks_shown >2& n_checks_shown <=5~"3-5", n_checks_shown >5& n_checks_shown <=10~"6-10", n_checks_shown >10~"over 10" ),checks_shown_bucket =factor(checks_shown_bucket ,levels =c("0","1","2", "3-5", "6-10","over 10") ))
Code
# define set of all eligible edits to review (eligible in control and activated in test)edit_check_save_data <- edit_check_save_data %>%mutate(is_test_eligible =ifelse(#(test_group == "test (Reference Check shown)" & is_test_eligible == 'eligible') | #circular reference, doesn't work (test_group =="test (Reference Check shown)"& was_reference_check_shown ==1) | (test_group =="control (Reference Check not shown)"& is_reference_check_eligible ==1) , 'eligible', 'not eligible'),is_test_eligible =factor( is_test_eligible,levels =c("eligible", "not eligible" ) ))
Overall
Code
edit_check_reverts_overall <- edit_check_save_data %>%filter(is_new_content ==1& is_test_eligible =='eligible') %>%#limit to eligible editsgroup_by(test_group) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#limit to new content edits without a refernecemutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%"))
Code
# plot visualization of overall edit completion ratesdodge <-position_dodge(width=0.9)p <- edit_check_reverts_overall %>%ggplot(aes(x= test_group, y = n_reverts/n_edits)) +geom_col(position ='dodge', fill ='dodgerblue4') +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(revert_rate), fontface=2), vjust=1.2, size =10, color ="white") +scale_fill_manual(values= cbPalette, name ="Reason") +labs (y ="Percent of edits reverted",x ="Experiment Group",title ="New content edit revert rate",caption ="Limited to published new content edits shown or eligible to be shown Reference Check") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=24),legend.position="none",axis.line =element_line(colour ="black")) p
Overall, published new content edits shown Reference Check are reverted less frequently. We’ve observed a 13.7% relative decrease in published edits where Reference Check was shown compared to edits eligible but not shown Reference Check.
This is steeper than the 8.6% new content edit revert rate relative decrease where reference check was available observed in the 2024 Reference Check A/B test.
Revert rates for both edits shown Reference Check (25.3%) or eligible to be shown Reference Check (29.3%) are slightly higher than the revert rates we observed for Reference Check in the 2024 Reference Check A/B test (25.6% for the control and 23.4% for the treatment) and higher than the revert rates seen in the 2025 multiple Edit Checks A/B test (22.5% for the control and 23.6% for the treatment).
Notes: * These revert rates include edits where the final published text may not include a reference. We plan to review the proportion of new content edits shown or eligible to be shown Reference Check that include a reference in the AB test analysis.
By whether mulitiple checks were shown
Code
edit_check_revert_bymultiple <- edit_check_save_data %>%filter(is_new_content ==1& was_reference_check_shown ==1& test_group =='test (Reference Check shown)' ) %>%group_by( multiple_checks_shown) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#limit to new content edits without a referencemutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(2,3)) %>%# removing granular data columns for publicationgt() %>%tab_header(title ="New content edit revert rate by if multiple checks were shown" ) %>%opt_stylize(5) %>%cols_label(multiple_checks_shown ="Multiple Check",#n_edits = "Number of published new content edits",#n_reverts = "Number of edits reverted ",revert_rate ="Proportion of new content edits that were reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits shown or eligible to shown Reference Check') )display_html(as_raw_html(edit_check_revert_bymultiple ))
New content edit revert rate by if multiple checks were shown
Multiple Check
Proportion of new content edits that were reverted
single check shown
24.4%
multiple checks shown
27.6%
Limited to published new content edits shown or eligible to shown Reference Check
We observed an increase in revert rate for edits that were shown multiple Reference Checks. This was also observed in the Tone Check leading indicator analysis (edits shown multiple tone checks were reverted more frequently).
In the 2025 Multiple Reference Check A/B test we observed a 34.7% decrease in revert rate when directly comparing edits presented multiple checks compared to edits presented a single reference check. It was noted that this decrease was likely in part because the types of edits that warrant multiple Reference Checks are less likely to be reverted than the types of edits that warrant only a single check.
By platform
Code
edit_check_revert_byplatform <- edit_check_save_data %>%#filter(is_new_content == 1 &was_reference_check_shown == 1) %>%filter(is_new_content ==1& is_test_eligible =='eligible') %>%#limit to eligible editsgroup_by(platform, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#limit to new content edits without a referencemutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(3,4)) %>%# removing granular data columns for publicationgt() %>%tab_header(title ="New content edit revert rate by platform" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",platform ="Platform",#n_edits = "Number of published new content edits",# n_reverts = "Number of edits reverted",revert_rate ="Proportion of new content edits that were reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits shown or eligible to shown Reference Check') )display_html(as_raw_html(edit_check_revert_byplatform ))
New content edit revert rate by platform
Test Group
Proportion of new content edits that were reverted
mobile web
control (Reference Check not shown)
45.8%
test (Reference Check shown)
29.1%
desktop
control (Reference Check not shown)
25.3%
test (Reference Check shown)
24.4%
Limited to published new content edits shown or eligible to shown Reference Check
Both treatment group mobile web and desktop editors saw improved revert rates over those in the control, with desktop editors in the treatment group seeing lower revert rates (24.4%) compared to mobile web treatment group editors (29.1%).
Mobile web editors in the treatment group saw a -36.5% relative decrease compared to the control group while desktop editors saw a -3.6% relative decrease.
This is consistent with earlier findings: In the 2024 Reference Check A/B test, there was a slight decrease in the revert rate of new content on both desktop and mobile platforms. The relative revert rate decreased by 9.4% (1.7 pp) on desktop and on mobile it decreased by 5.9% (2 pp)
In the 2025 Multiple Reference Check A/B test, we also observed higher revert rates for mobile web treatment group edits (29.1%) compared to desktop treatment group edits (24.4%).
By user experience
Code
edit_check_revert_byuserexp <- edit_check_save_data %>%#filter(is_new_content == 1 & was_reference_check_shown == 1) %>% #limit to eligible editsfilter(is_new_content ==1& is_test_eligible =='eligible') %>%#limit to eligible editsgroup_by(experience_level_group,test_group) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#limit to new content edits without a referencemutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(3,4)) %>%# removing granular data columns for publicationgt() %>%tab_header(title ="New content edit revert rate by user experience" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiement Group",experience_level_group ="User Status",#n_edits = "Number of published new content edits",#n_reverts = "Number of edits reverted",revert_rate ="Proportion of new content edits that were reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits shown or eligible to shown Reference Check') )display_html(as_raw_html(edit_check_revert_byuserexp))
New content edit revert rate by user experience
Experiement Group
Proportion of new content edits that were reverted
Unregistered
control (Reference Check not shown)
36.8%
test (Reference Check shown)
32.2%
Newcomer
control (Reference Check not shown)
42.3%
test (Reference Check shown)
39.6%
Junior Contributor
control (Reference Check not shown)
25.1%
test (Reference Check shown)
20.4%
Limited to published new content edits shown or eligible to shown Reference Check
Revert rates decreased across all user types, consistent with the 2024 Reference Check A/B test and in part with the 2025 Multiple Reference Check A/B test analyses where revert rates decreased for newcomers and slightly increased for junior contributors and unregistered editors when comparing treatment to control groups. The 2025 Multiple Reference Checks A/B test highlights, “Results vary slightly based on the type of user completing the edit but none of the observed changes were statistically significant.”
D) Reference Check Dismissal Rate (Users that select to keep text without adding a reference)
Question: Do people find Reference Check relevant?
Methodology: We reviewed the proportion of published edits shown Reference Check wherein people elected to keep the text they added (i.e. the Reference Check was dismissed). This was determined by edits where the user dismissed a Reference Check at least once in a session (event.feature = 'editCheck-addReference' AND event.action = 'action-reject').
The analysis includes splits by the reason the user selected for keeping the text.
#rename experiment field to clarify edit_check_reject_data <- edit_check_reject_data %>% mutate(test_group = factor(test_group, #levels = c(‘2025-09-editcheck-paste-control’, ‘2025-09-editcheck-paste-test’), levels = c(‘2025-09-editcheck-addReference-control’, ‘2025-09-editcheck-addReference-test’), labels = c(“control (no Reference Check)”, “test (shown Reference Check)”)))
#rename platform from phone to mobile web to clarify meaning edit_check_reject_data <- edit_check_reject_data %>% mutate(platform = factor(platform, levels = c(‘phone’, ‘desktop’), labels = c(“mobile web”, “desktop”)))
#Set fields and factor levels to assess number of checks shown
edit_check_dismissal_overall <- edit_check_reject_data %>% filter(was_reference_check_shown == 1) %>% #limit to where shown summarise(n_edits = n_distinct(editing_session), n_rejects = n_distinct(editing_session[n_rejects > 0])) %>% #limit to new content edits without a reference mutate(dismissal_rate = paste0(round(n_rejects/n_edits * 100, 1), “%”)) %>%
gt() %>% tab_header( title = “Reference Check dismissal rate” ) %>% opt_stylize(5) %>% cols_label( n_edits = “Number of edits shown Reference Check”, n_rejects = “Number of edits that dimisssed Reference Check”, dismissal_rate = “Proportion of edits where Reference Check was dismissed” ) %>% tab_source_note( gt::md(‘Limited to published edits where at least one Reference Check was shown’) )
Users elected to keep the text without a reference when prompted at 62% of edits shown Reference Check.
This dismissal rate is higher than the 55% dismissal rate for Paste Check, which is similar to rates observed for Tone Check (57%) and the 2024 Reference Check A/B test where 51.8% of contributors dismissed the citation by explicitly indicating that the information they were adding did not need a reference.
By dismissal reason
edit_check_dismissal_byreason_overall <- edit_check_reject_data %>% filter(was_reference_check_shown == 1 & n_rejects > 0) %>% #limit to where shown and user elected to keep text group_by(reject_reason) %>% summarise(n_edits_rejected = n_distinct(editing_session)) %>% mutate(select_rate = paste0(round(n_edits_rejected/sum(n_edits_rejected) * 100, 1), “%”))
plot bar chart of reason selection
dodge <- position_dodge(width=0.9)
p <- edit_check_dismissal_byreason_overall %>% # Reorder by frequency (descending) arrange(desc(n_edits_rejected)) %>% mutate(reject_reason = factor(reject_reason, levels = unique(reject_reason))) %>% ggplot(aes(x= reject_reason, y = n_edits_rejected/sum(n_edits_rejected))) + geom_col(position = ‘dodge’, fill = ‘dodgerblue4’) + scale_y_continuous(labels = scales::percent) + geom_text(aes(label = paste(select_rate, “”, n_edits_rejected,“edits”), fontface=2), vjust=1.2, size = 10, color = “white”) + scale_fill_manual(values= cbPalette, name = “Reason”) + labs (y = “Percent of edits”, x = “Selected reason”, title = “Reasons users selected for keeping the text without adding a reference - Sorted”, caption = “Limited to published edits where a user selected to keep the text without adding a reference”) + theme( panel.grid.minor = element_blank(), panel.background = element_blank(), plot.title = element_text(hjust = 0.5), text = element_text(size=24), legend.position= “none”, axis.line = element_line(colour = “black”))
p
By whether multiple checks were shown
edit_check_dismissal_bymultiple <- edit_check_reject_data %>% filter(was_reference_check_shown == 1) %>% #limit to where shown group_by(multiple_checks_shown) %>% summarise(n_edits = n_distinct(editing_session), n_rejects = n_distinct(editing_session[n_rejects > 0])) %>% #limit to new content edits without a reference mutate(dismissal_rate = paste0(round(n_rejects/n_edits * 100, 1), “%”)) %>%
gt() %>% tab_header( title = “Reference Check dismissal rate by if multiple checks shown” ) %>% opt_stylize(5) %>% cols_label( multiple_checks_shown = “Multiple Checks”, n_edits = “Number of edits shown Reference Check”, n_rejects = “Number of edits that dimisssed Reference Check”, dismissal_rate = “Proportion of edits where Reference Check was dismissed” ) %>% tab_source_note( gt::md(‘Limited to published edits where at least one Reference Check was shown’) )
We see a higher dismissal rate if more checks are shown; this was not observed with Tone Check.
In the 2025 Multiple Reference Check A/B test, we observed a 66.3% dismissal rate for the single check group compared to 54.8% for the multiple checks treatment group.
By platform
edit_check_dismissal_byplatform <- edit_check_reject_data %>% filter(was_reference_check_shown == 1) %>% #limit to where shown group_by(platform) %>% summarise(n_edits = n_distinct(editing_session), n_rejects = n_distinct(editing_session[n_rejects > 0 ])) %>% #limit to new content edits without a reference mutate(dismissal_rate = paste0(round(n_rejects/n_edits * 100, 1), “%”)) %>% ungroup() %>% #mutate(n_edits = ifelse(n_edits < 50, “<50”, n_edits), #n_rejects = ifelse(n_rejects < 50, “<50”, n_rejects)) %>% #sanitizing per data publication guidelines #select(-2) %>% gt() %>% tab_header( title = “Reference Check dismissal rate by platform” ) %>% opt_stylize(5) %>% cols_label( platform = “Platform”, n_edits = “Number of edits shown Reference Check”, n_rejects = “Number of edits that dimisssed Reference Check”, dismissal_rate = “Proportion of edits where Reference Check was dismissed” ) %>% tab_source_note( gt::md(‘Limited to published edits where at least one Reference Check was shown’) )
Users are slightly more likely to keep the text without adding a reference on mobile web. Users selected to keep the text without adding a reference at 62.8% of all published mobile web edits where Reference Check was shown compared to 61.9% of desktop published edits. There was -1.4% decrease in the Reference Check dismissal rate on mobile web compared to desktop. This is similar to the 2024 Reference Check A/B test, where there was a slightly higher rate of reference checks declined on mobile (58%) compared to desktop (44.3%).
The Paste Check Leading Indicators Analysis report showed the opposite: Users selected to keep the pasted text at 48% of all published mobile web edits where Paste Check was shown compared to 56% of desktop published edits. There was -14% decrease in the Paste Check dismissal rate on mobile compared to desktop.
Dismissal reason by platform
edit_check_dismissal_byreason_byplatform <- edit_check_reject_data %>% filter(was_reference_check_shown == 1 & n_rejects > 0) %>% #limit to where shown and user elected to keep text group_by(platform, reject_reason) %>% summarise(n_edits_rejected = n_distinct(editing_session)) %>% mutate(select_rate = round(n_edits_rejected/sum(n_edits_rejected), 2))
plot bar chart of reason selection
dodge <- position_dodge(width=0.9)
p <- edit_check_dismissal_byreason_byplatform %>% # Calculate total across platforms for sorting group_by(reject_reason) %>% mutate(total_edits = sum(n_edits_rejected)) %>% ungroup() %>% # Reorder by overall frequency (descending) arrange(desc(total_edits)) %>% mutate(reject_reason = factor(reject_reason, levels = unique(reject_reason))) %>% ggplot(aes(x= reject_reason, y =select_rate, fill = reject_reason)) + geom_col(position = ‘dodge’,) + scale_y_continuous(labels = scales::percent) + geom_text(aes(label = paste0(select_rate * 100, “%”), fontface=2), vjust=1.2, size = 10, color = “white”) + facet_grid(~ platform ) + labs (y = “Percent of edits”, x = “Selected reason”, title = “Reasons users selected for keeping text without adding a reference - Sorted”) + scale_fill_manual(values= cbPalette, name = “Reason”) + theme( panel.grid.minor = element_blank(), panel.background = element_blank(), plot.title = element_text(hjust = 0.5), text = element_text(size=24), legend.position= “bottom”, axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.line = element_line(colour = “black”))
p
By user experience
edit_check_dismissal_byuserexp <- edit_check_reject_data %>% filter(was_reference_check_shown == 1) %>% #limit to where shown group_by(experience_level_group) %>% summarise(n_edits = n_distinct(editing_session), n_rejects = n_distinct(editing_session[n_rejects > 0])) %>% #limit to new content edits without a reference mutate(dismissal_rate = paste0(round(n_rejects/n_edits * 100, 1), “%”)) %>%
ungroup() %>% mutate(n_edits = ifelse(n_edits < 50, “<50”, n_edits), n_rejects = ifelse(n_rejects < 50, “<50”, n_rejects)) %>% #sanitizing per data publication guidelines #select(-2) %>% gt() %>% tab_header( title = “Reference Check dismissal rate by user experience” ) %>% opt_stylize(5) %>% cols_label( experience_level_group = “User Experience”, n_edits = “Number of edits shown Reference check”, n_rejects = “Number of edits that dimisssed Reference Check”, dismissal_rate = “Proportion of edits where Reference Check was dismissed” )%>% tab_source_note( gt::md(‘Limited to published edits where at least one Reference Check was shown’) )
Unlike in the Paste Check Leading Indicators Analysis report and Tone Check, Unregistered users are dismissing Reference Check at higher rates compared to Newcomers and Junior Contributors.
Dismissal reason by user experience
edit_check_dismissal_byreason_byuserexp <- edit_check_reject_data %>% filter(was_reference_check_shown == 1 & n_rejects > 0) %>% #limit to where shown and user elected to keep text group_by(experience_level_group, reject_reason) %>% summarise(n_edits_rejected = n_distinct(editing_session)) %>% mutate(select_rate = round(n_edits_rejected/sum(n_edits_rejected),2))
Code
# plot bar chart of reason selectiondodge <-position_dodge(width=0.9)p <- edit_check_dismissal_byreason_byuserexp %>%# Calculate total across experience groups for sortinggroup_by(reject_reason) %>%mutate(total_edits =sum(n_edits_rejected)) %>%ungroup() %>%# Reorder by overall frequency (descending)arrange(desc(total_edits)) %>%mutate(reject_reason =factor(reject_reason, levels =unique(reject_reason))) %>%ggplot(aes(x= reject_reason, y = select_rate, fill = reject_reason)) +geom_col(position ='dodge') +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste0(select_rate *100, "%"), fontface=2), vjust=1.2, size =10, color ="white") +facet_grid( ~ experience_level_group) +labs (y ="Percent of edits ",x ="Selected reason",title ="Reasons users selected for keeping text without adding a reference - Sorted") +scale_fill_manual(values= cbPalette, name ="Reason") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=24),legend.position="bottom",axis.text.x =element_blank(),axis.ticks.x =element_blank(),axis.line =element_line(colour ="black")) p