The Editing team is evaluating the impact of Tone Check through an A/B test.
Tone Check is an Edit check that uses a language model to prompt people adding promotional, derogatory, or otherwise subjective language to consider “neutralizing” the tone of what they are writing. Multiple tone checks can be shown within an editing session either during the edit or during the save process. You can find more details about this check on the Project Page.
The Tone Check A/B test was deployed on 3 September 2025 to French, Japanese, and Portuguese Wikipedias. Prior to completing the full analysis, we reviewed the following set of leading indicators 2 weeks after starting the Tone A/B Test:
Proportion of edits Tone Check is shown within
Proportion of contributors that are presented Tone Check and complete their edits
Proportion of edits wherein people elect to dismiss/not change the text they’ve added
Proportion of people blocked after publishing an edit where Tone Check was shown
Proportion of published edits that add new content and are reverted within 48hours
Proportion of edits that are published before the model is able to return an evaluation.
Decision to be made: What – if any – adjustments/investigations will we prioritize for us to be confident moving forward with evaluating the Tone Check’s impact in T387918?
We collected two weeks of AB test events logged between 8 September 2025 and 22 September 2025 on French, Japanese, and Portuguese Wikipedia.
In this AB test, users in the test group will be shown tone check if attempting an edit that meets the requirements for the check to be shown in VisualEditor. The control group is provided the default editing experience where no tone check is shown.
For each leading indicator metric, we reviewed the following dimensions: by experiment group (test and control), by platform (mobile web or desktop), by user experience and status, and by partner wikipedia. We also reviewed some indicators such as completion rate by the number of checks shown within a single editing session.
We relied on events logged in VisualEditorFeatureUse and change tags recorded in the revision tags table. See instrumentation spec.
Data was limited to mobile and desktop edits completed on a main page namespace using VisualEditor on one of the partner Wikipedias. We also limited to edits completed by unregistered users and users with 100 or fewer edits as those are the users that would be shown tone check under the default config settings.
Note
Results are based on initial AB test data to check if any adjustments to the feature need to be prioritized. More event data will be needed to confirm statistical significance for many of these findings especially for any per user experience or per Wikipedia breakdowns. We will review the complete AB test data as part of the analysis in T387918
Summary of results
Proportion of edits Tone Check is shown within
Tone Check has been shown within 421 editing sessions across all three partner Wikipedias over the reviewed two-week timeframe. This represents only about 0.1% of all edit attempts. When limited to saved edits, Tone Check has been shown in 9% of all published new content edits (125 of 1,377 published edits in the test group) since the AB test started.
It appears just slightly more frequently for desktop published edits compared to mobile. It has been shown at 9.5% of all published new content edits on desktop and 7.6% of all published new content edits on mobile.
Proportion of contributors that are presented Tone Check and complete their edits
We’ve observed a slight -2.3% decrease in the edit completion rate for edits shown tone check compared to eligible edits in the control group. In the test group, 66.7% of all edits shown tone check were successfully completed compared to 68.3% in the control group. So far, there have been no significant decreases in edit completion rate by experience level, Wikipedia, or for editing sessions where multiple tone checks were shown.
Proportion of edits wherein people elect to dismiss/not change the text they’ve added
A little over half of all published edits where tone check was shown (57%) included at least one tone check that the user dismissed. This is similar to the rates observed for Reference Check.
Tone checks are dismissed more frequently on desktop compared to mobile. 63.8% of all published desktop edits where tone check was shown include at least one check that was dismissed compared to 39% of all published mobile edits.
Proportion of published edits that add new content and are reverted within 48hours
There have been no significant changes in the revert rate of all new content edits overall or by platform or Wikipedia. However, we’ve observed decreases in revert rate when limiting to edits where tone check was shown or eligible to be shown.
We’ve observed a -5.3% decrease in the revert rate of desktop edits and -19% decrease in the revert rate of mobile edits for edits where tone check was shown at least once in an editing session compared to eligible edits in the control group.
For edits shown tone check and where text was revised to address the issue, we observed almost a 2x decrease in revert rate compared to eligible control edits. More published edit data is needed to confirm impacts on a per Wikipedia and per platform basis.
Proportion of people blocked after publishing an edit where Tone Check was shown
Less than 1% users have been blocked after publishing an edit where at least one tone check was shown.
Proportion of edits that are published before the model is able to return an evaluation
Only about 0.6% of all published edits (264 edits) in the AB test were saved before the model returned an evaluation. The majority of these edits occurred in the control group and on desktop.
Proportion of published edits shown at least one Tone Check
Question: Are newcomers encountering Tone Check?
Methodology: Reviewed the number of published new content edits where at least one Tone Check was shown during the editing session (event.feature = 'editCheck-Tone' AND event.action IN ('check-shown-midedit', 'check-shown-presave').
This analysis was specifically limited to edits that were successfully published and identified as new content edits with the tag editcheck-newcontent.
Code
#load frequency datatone_check_frequency_data <-read.csv(file ='Queries/data/tone_check_frequency_data.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
Code
# Set experience level group and factor levelstone_check_frequency_data <- tone_check_frequency_data %>%mutate(experience_level_group =case_when( user_edit_count ==0& user_status =='registered'~'Newcomer', user_edit_count ==0& user_status =='unregistered'~'Unregistered', user_edit_count >0& user_edit_count <=100~"Junior Contributor", user_edit_count >100~"Non-Junior Contributor" ),experience_level_group =factor(experience_level_group,levels =c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor") )) #rename experiment field to clarifytone_check_frequency_data <- tone_check_frequency_data %>%mutate(test_group =factor(test_group,levels =c('2025-09-editcheck-tone-control', '2025-09-editcheck-tone-test'),labels =c("control (no tone check)", "test (tone check available)")))
Code
#Set fields and factor levels to assess number of checks showntone_check_frequency_data <- tone_check_frequency_data %>%mutate(multiple_checks_shown =ifelse(n_checks_shown >1, 1, 0), multiple_checks_shown =factor( multiple_checks_shown ,levels =c(0,1)))# note these buckets can be adjusted as needed based on distribution of datatone_check_frequency_data <- tone_check_frequency_data %>%mutate(checks_shown_bucket =case_when(is.na(n_checks_shown) ~'0', n_checks_shown ==1~'1', n_checks_shown ==2~'2', n_checks_shown >2& n_checks_shown <=5~"3-5", n_checks_shown >5& n_checks_shown <=10~"6-10", n_checks_shown >10~"over 10" ),checks_shown_bucket =factor(checks_shown_bucket ,levels =c("0","1","2", "3-5", "6-10", "over 10") ))
Overall proportion of all edit attempts
We first looked at the proportion of all VE edit attempts to determine how frequently across all editing sessions this check was appearing. Note: This includes a large number of edits that were abandoned prior to reaching a point where tone check would be shown.
Code
tone_checks_shown_overall <- tone_check_frequency_data %>%filter(test_group =="test (tone check available)") %>%#limit to test group editsgroup_by(test_group) %>%summarise(n_editing_session =n_distinct(editing_session),n_editing_session_refcheck =n_distinct(editing_session[was_tone_check_shown ==1])) %>%mutate(prop_check_shown =paste0(round(n_editing_session_refcheck/n_editing_session *100, 1), "%")) %>%gt() %>%tab_header(title ="Edit attempts shown at least one tone check by experiment group" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",n_editing_session ="Number of edit attempts",n_editing_session_refcheck ="Number of edit attempts shown tone check", prop_check_shown ="Proportion of edit attempts shown tone check" ) %>%tab_source_note( gt::md('Limited to edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(tone_checks_shown_overall))
Edit attempts shown at least one tone check by experiment group
Experiment Group
Number of edit attempts
Number of edit attempts shown tone check
Proportion of edit attempts shown tone check
test (tone check available)
701484
421
0.1%
Limited to edits by unregistered users and users with 100 or fewer edits
Overall proportion of saved new content edits
We then limited to saved new content edits to check the prevalence of tone checks in published edits.
Code
tone_checks_shown_saved_overall <- tone_check_frequency_data %>%filter(test_group =="test (tone check available)"&#limit to test group edits was_saved ==1& is_new_content ==1) %>%#limit to published new content editsgroup_by(test_group) %>%summarise(n_editing_session =n_distinct(editing_session),n_editing_session_refcheck =n_distinct(editing_session[was_tone_check_shown ==1])) %>%mutate(prop_check_shown =paste0(round(n_editing_session_refcheck/n_editing_session *100, 1), "%")) %>%gt() %>%tab_header(title ="Published new content edits shown at least one tone check by experiment group" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",n_editing_session ="Number of edits",n_editing_session_refcheck ="Number of edits shown tone check", prop_check_shown ="Proportion of edits shown tone check" ) %>%tab_source_note( gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(tone_checks_shown_saved_overall))
Published new content edits shown at least one tone check by experiment group
Experiment Group
Number of edits
Number of edits shown tone check
Proportion of edits shown tone check
test (tone check available)
1377
125
9.1%
Limited to published new content edits by unregistered users and users with 100 or fewer edits
By if multiple checks were shown
Code
tone_checks_shown_saved_bymultiple <- tone_check_frequency_data %>%filter(test_group =="test (tone check available)"&#limit to test group edits was_saved ==1& is_new_content ==1) %>%#limit to published new content editsgroup_by(test_group) %>%summarise(n_editing_session =n_distinct(editing_session),n_editing_session_multicheck =n_distinct(editing_session[was_tone_check_shown ==1& multiple_checks_shown ==1])) %>%mutate(prop_check_shown =paste0(round(n_editing_session_multicheck/n_editing_session *100, 1), "%")) %>%gt() %>%tab_header(title ="Published new content edits shown multiple tone checks in the test group" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",n_editing_session ="Number of edits",n_editing_session_multicheck ="Number of edits shown multiple tone checks", prop_check_shown ="Proportion of edits shown multiple tone checks" ) %>%tab_source_note( gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(tone_checks_shown_saved_bymultiple))
Published new content edits shown multiple tone checks in the test group
Experiment Group
Number of edits
Number of edits shown multiple tone checks
Proportion of edits shown multiple tone checks
test (tone check available)
1377
93
6.8%
Limited to published new content edits by unregistered users and users with 100 or fewer edits
By number of checks shown
Code
tone_checks_shown_saved_bynchecks <- tone_check_frequency_data %>%filter(test_group =="test (tone check available)"&#limit to test group edits was_saved ==1& is_new_content ==1) %>%#limit to published new content edits#filter(was_tone_check_shown == 1) %>% #if you want to limit to just edits shown tone checkmutate(total_sessions =n_distinct(editing_session)) %>%group_by(total_sessions, checks_shown_bucket) %>%summarise(n_editing_session_tonecheck =n_distinct(editing_session)) %>%mutate(prop_check_shown =paste0(round(n_editing_session_tonecheck/total_sessions *100, 2), "%")) %>%ungroup() %>%select(-c(1,3)) %>%#mutate(n_editing_session_refcheck = ifelse(n_editing_session_refcheck < 50, "<50", n_editing_session_refcheck)) %>% #sanitizing per data publication guidelinesgt() %>%tab_header(title ="Published new content edits by total number of tone checks shown" ) %>%opt_stylize(5) %>%cols_label(checks_shown_bucket ="Number of tone checks shown",#n_editing_session_tonecheck = "Number of edits", prop_check_shown ="Proportion of edits" ) %>%tab_source_note( gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(tone_checks_shown_saved_bynchecks))
Published new content edits by total number of tone checks shown
Number of tone checks shown
Proportion of edits
0
90.92%
1
2.32%
2
2.69%
3-5
2.69%
6-10
0.73%
over 10
0.65%
Limited to published new content edits by unregistered users and users with 100 or fewer edits
By platform
Code
tone_checks_shown_byplatform <- tone_check_frequency_data %>%filter(test_group =="test (tone check available)"&#limit to test group edits was_saved ==1& is_new_content ==1) %>%#limit to published new content editsgroup_by(platform) %>%summarise(n_editing_session =n_distinct(editing_session),n_editing_session_tonecheck =n_distinct(editing_session[was_tone_check_shown ==1])) %>%mutate(prop_check_shown =paste0(round(n_editing_session_tonecheck/n_editing_session *100, 1), "%")) %>%mutate(n_editing_session_tonecheck =ifelse(n_editing_session_tonecheck <50, "<50", n_editing_session_tonecheck))%>%#sanitizing per data publication guidelineselect(-2) %>%#removing total number of edits column to santize data for publicationgt() %>%tab_header(title ="Published new content edits shown at least one tone check by platform" ) %>%opt_stylize(5) %>%cols_label(platform ="Platform",#n_editing_session = "Number of edits",n_editing_session_tonecheck ="Number of edits shown tone check", prop_check_shown ="Proportion of edits shown tone check" ) %>%tab_source_note( gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(tone_checks_shown_byplatform))
Published new content edits shown at least one tone check by platform
Platform
Number of edits shown tone check
Proportion of edits shown tone check
desktop
99
9.5%
phone
<50
7.6%
Limited to published new content edits by unregistered users and users with 100 or fewer edits
By user experience
Code
tone_checks_shown_byuser_status <- tone_check_frequency_data %>%filter(test_group =="test (tone check available)"&#limit to test group edits was_saved ==1& is_new_content ==1) %>%#limit to published new content editsgroup_by(experience_level_group ) %>%summarise(n_editing_session =n_distinct(editing_session),n_editing_session_tonecheck =n_distinct(editing_session[was_tone_check_shown ==1])) %>%mutate(prop_check_shown =paste0(round(n_editing_session_tonecheck/n_editing_session *100, 1), "%")) %>%mutate(n_editing_session_tonecheck =ifelse(n_editing_session_tonecheck <50, "<50", n_editing_session_tonecheck))%>%#sanitizing per data publication guidelineselect(-2) %>%#removing total number of edits column to santize data for publicationgt() %>%tab_header(title ="Published new content edits shown at least one tone check by user experience" ) %>%opt_stylize(5) %>%cols_label(experience_level_group ="User Experience",#n_editing_session = "Number of edits",n_editing_session_tonecheck ="Number of edits shown tone check", prop_check_shown ="Proportion of edits shown tone check" ) %>%tab_source_note( gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(tone_checks_shown_byuser_status ))
Published new content edits shown at least one tone check by user experience
User Experience
Number of edits shown tone check
Proportion of edits shown tone check
Unregistered
<50
14.2%
Newcomer
<50
13.8%
Junior Contributor
51
6%
Limited to published new content edits by unregistered users and users with 100 or fewer edits
By Partner Wikipedia
Code
tone_checks_shown_bywiki <- tone_check_frequency_data %>%filter(test_group =="test (tone check available)"&#limit to test group edits was_saved ==1& is_new_content ==1) %>%#saved test group editsgroup_by(wiki) %>%summarise(n_editing_session =n_distinct(editing_session),n_editing_session_tonecheck =n_distinct(editing_session[was_tone_check_shown ==1])) %>%mutate(prop_check_shown =paste0(round(n_editing_session_tonecheck/n_editing_session *100, 1), "%")) %>%mutate(n_editing_session_tonecheck =ifelse(n_editing_session_tonecheck <50, "<50", n_editing_session_tonecheck))%>%#sanitizing per data publication guidelineselect(-2) %>%#removing total number of edits column to santize data for publicationgt() %>%tab_header(title ="Published new content edits shown at least one tone check by Wikipedia" ) %>%opt_stylize(5) %>%cols_label(wiki ="Wikipedia",#n_editing_session = "Number of edits",n_editing_session_tonecheck ="Number of edits shown tone check", prop_check_shown ="Proportion of edits shown tone check" ) %>%tab_source_note( gt::md('Limited to published new content edits by unregistered users and users with 100 or fewer edits') )display_html(as_raw_html(tone_checks_shown_bywiki))
Published new content edits shown at least one tone check by Wikipedia
Wikipedia
Number of edits shown tone check
Proportion of edits shown tone check
frwiki
93
11.4%
jawiki
<50
6.3%
ptwiki
<50
5.1%
Limited to published new content edits by unregistered users and users with 100 or fewer edits
Key Insights
Tone Check has been shown within 9% of all published new content edits (125 of 13777 published edits in the test group) since the AB test started.
The majority of new content edits shown tone check (75%) were shown more than one tone check in an editing session (93 edits total; 6.8% of all published new content edits).
Only 1.4% of all published new content edits were shown more than 6 tone checks within a session.
It has been shown at 9.5% of all published new content edits on desktop and 7.6% of all published new content edits on mobile.
Tone Check appears more frequently in published new content edits by newcomers and unregistered users compared to Junior Contributors.
Frequency appears to slightly vary by partner Wikipedia. 11% of all published new content edits at French Wikipedia have been shown Tone Check compared to 6% at Japanese and 5% at Portuguese Wikipedia.
Overall, tone check appear much less frequently compared to Reference Check which was shown in close to 80% of all editing sessions.
Proportion of contributors that are presented tone check and complete their edits
Question Do newcomers understand the feature?
Methodology We reviewed the proportion of edits where tone check was shown at least once during the edit session and that were successfully published (event.action = saveSuccess). These edits were compared to the completion rate of edits in the control group that were eligible but not shown tone check, as implemented in T394952.
Note: This anlysis excludes edits that were abandoned prior to reaching the point where tone check was or would have been shown.
Code
# load data for assessing edit completion rateedit_completion_rates <-read.csv(file ='Queries/data/edit_completion_rate.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
Code
# Set experience level group and factor levelsedit_completion_rates <- edit_completion_rates %>%mutate(experience_level_group =case_when( user_edit_count ==0& user_status =='registered'~'Newcomer', user_edit_count ==0& user_status =='unregistered'~'Unregistered', user_edit_count >0& user_edit_count <=100~"Junior Contributor", user_edit_count >100~"Non-Junior Contributor" ),experience_level_group =factor(experience_level_group,levels =c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor") )) #rename experiment field to clarfiyedit_completion_rates <- edit_completion_rates %>%mutate(test_group =factor(test_group,levels =c('2025-09-editcheck-tone-control', '2025-09-editcheck-tone-test'),labels =c("control (no tone check)", "test (tone check available)")))
Code
#Set fields and factor levels to assess number of checks shownedit_completion_rates <- edit_completion_rates %>%mutate(multiple_checks_shown =ifelse(n_checks_shown >1, "multiple checks shown", "one check shown"), multiple_checks_shown =factor( multiple_checks_shown ,levels =c("one check shown", "multiple checks shown")))# note these buckets can be adjusted as needed based on distribution of dataedit_completion_rates <- edit_completion_rates %>%mutate(checks_shown_bucket =case_when(is.na(n_checks_shown) ~'0', n_checks_shown ==1~'1', n_checks_shown ==2~'2', n_checks_shown >2& n_checks_shown <=5~"3-5", n_checks_shown >5& n_checks_shown <=10~"6-10", n_checks_shown >10~"over 10" ),checks_shown_bucket =factor(checks_shown_bucket ,levels =c("0","1","2", "3-5", "6-10","over 10") ))
Overall by experiment group
Code
edit_completion_rate_overall <- edit_completion_rates %>%filter(tone_check_shown ==1) %>%#limit to sessions where tone check was showngroup_by(test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Edit completion rate by experiment group" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",n_edits ="Number of edit attempts",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown or eligible to be shown at least one tone check') )display_html(as_raw_html(edit_completion_rate_overall ))
Edit completion rate by experiment group
Experiment Group
Number of edit attempts
Number of published edits
Proportion of edits saved
control (no tone check)
284
194
68.3%
test (tone check available)
430
287
66.7%
Limited to edit attempts shown or eligible to be shown at least one tone check
By if multiple checks were shown for test group
Code
edit_completion_rate_bymulti <- edit_completion_rates %>%filter(tone_check_shown ==1& test_group =='test (tone check available)') %>%group_by(test_group, multiple_checks_shown) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Edit completion rate by if multiple checks were shown" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment group",multiple_checks_shown ="Multiple tone checks shown",n_edits ="Number of edit attempts shown tone check",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown or eligible to be shown at least one tone check') )display_html(as_raw_html(edit_completion_rate_bymulti))
Edit completion rate by if multiple checks were shown
Multiple tone checks shown
Number of edit attempts shown tone check
Number of published edits
Proportion of edits saved
test (tone check available)
one check shown
158
105
66.5%
multiple checks shown
272
182
66.9%
Limited to edit attempts shown or eligible to be shown at least one tone check
By number of checks shown for test group
Code
edit_completion_rate_bynchecks <- edit_completion_rates %>%filter(tone_check_shown ==1& test_group =='test (tone check available)') %>%#limit to tone checks shown and test groupgroup_by(test_group, checks_shown_bucket) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%ungroup()%>%mutate(n_edits =ifelse(n_edits <50, "<50", n_edits),n_saves =ifelse(n_saves <50, "<50", n_saves)) %>%#sanitizing per data publication guidelinesgroup_by(test_group) %>%gt() %>%tab_header(title ="Edit completion rate by the number of tone checks shown" ) %>%opt_stylize(5) %>%cols_label(checks_shown_bucket ="Number of tone checks shown",n_edits ="Number of edit attempts shown tone check",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown or eligible to be shown at least one tone check') )display_html(as_raw_html(edit_completion_rate_bynchecks))
Edit completion rate by the number of tone checks shown
Number of tone checks shown
Number of edit attempts shown tone check
Number of published edits
Proportion of edits saved
test (tone check available)
1
158
105
66.5%
2
132
91
68.9%
3-5
91
63
69.2%
6-10
<50
<50
50%
over 10
<50
<50
66.7%
Limited to edit attempts shown or eligible to be shown at least one tone check
By Platform
Code
edit_completion_rate_byplatform <- edit_completion_rates %>%filter(tone_check_shown ==1) %>%group_by(platform, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%#mutate(n_saves = ifelse(n_saves < 50, "<50", n_saves))%>% #sanitizing per data publication guidelineselect(-c(3,4)) %>%gt() %>%tab_header(title ="Edit completion rate by experiment group and platform" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",platform ="Platform",#n_edits = "Number of edit attempts shown tone check",#n_saves = "Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown or eligible to be shown at least one tone check') )display_html(as_raw_html(edit_completion_rate_byplatform))
Edit completion rate by experiment group and platform
Experiment Group
Proportion of edits saved
desktop
control (no tone check)
69.4%
test (tone check available)
65.8%
phone
control (no tone check)
64.5%
test (tone check available)
69.4%
Limited to edit attempts shown or eligible to be shown at least one tone check
By user experience
Code
edit_completion_rate_byuserstatus <- edit_completion_rates %>%filter(tone_check_shown ==1) %>%group_by(experience_level_group, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%select(-c(3,4)) %>%#data sanitizing for publicationgt() %>%tab_header(title ="Edit completion rate by experiment group and editor experience" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",experience_level_group ="Experiment Group",#n_edits = "Number of edit attempts shown tone check",#n_saves = "Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown or eligible to be shown at least one tone check') )display_html(as_raw_html(edit_completion_rate_byuserstatus))
Edit completion rate by experiment group and editor experience
Test Group
Proportion of edits saved
Unregistered
control (no tone check)
55.8%
test (tone check available)
57.1%
Newcomer
control (no tone check)
64.7%
test (tone check available)
63.9%
Junior Contributor
control (no tone check)
78.3%
test (tone check available)
75.6%
Limited to edit attempts shown or eligible to be shown at least one tone check
By Partner Wikipedia
Code
edit_completion_rate_bywiki <- edit_completion_rates %>%filter(tone_check_shown ==1) %>%group_by(wiki, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%select(-c(3,4)) %>%#data sanitizing for publicationgt() %>%tab_header(title ="Edit completion rate by experiment group and Wikipedia" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",wiki ="Wikipedia",#n_edits = "Number of edit attempts shown tone check",#n_saves = "Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown or eligible to be shown at least one tone check') )display_html(as_raw_html(edit_completion_rate_bywiki ))
Edit completion rate by experiment group and Wikipedia
Test Group
Proportion of edits saved
frwiki
control (no tone check)
71.9%
test (tone check available)
70.2%
jawiki
control (no tone check)
65%
test (tone check available)
63.5%
ptwiki
control (no tone check)
56.4%
test (tone check available)
51%
Limited to edit attempts shown or eligible to be shown at least one tone check
Key Insights
We’ve only observed a slight decrease in the edit completion rate for edits shown tone check compared to eligible edits in the control group. In the test group, 66.7% of all edits shown tone check were successfully completed compared to 68.3% in the control group (-2.3% decrease).
We have not seen any significant decreases if multiple checks were shown. Edit completion rate was about 66% even if over 6 tone checks were shown in an editing session (Note: These’s been very few editing sessions where more than 6 checks have been shown, so more data will be needed to verify impacts to completion rates at this level).
We have not observed any significant decreases in edit completion rate by platform or user experience type.
Currently, the decrease in edit completion has primarily been observed on desktop edits. On mobile, the edit completion rate for edits shown tone check has increased compared to the control (64.5% (control) → 69.4% (test); +7.6% increase)
We did not observe any significant changes in completion rate by experience group. The highest relative decrease was observed for Junior Contributors (-3.4%).
Results vary by wiki. We observed a 6.3% increase in edit completion rate at Japanese Wikipedia and close to a 10% decrease at Portuguese Wikipedia. Each of these wikis currently have small sample of tone check edits to review (<100 per test group) so more data is needed to confirm trends.
Proportion of edits wherein people elect to dismiss/not change the text they’ve added
Question: Do people find Tone Check relevant?
Methodology: We reviewed the propotion of published edits shown tone check wherein people elected to dismiss changing the text they added. This was determined by edits where the user dimissed a tone check at least once in a session (event.feature = 'editCheck-tone'AND event.action = 'action-dismiss').
We also reviewed the proportion of all saved edits that the model still identified having non-neutral language event after being shown tone check at least once. These edits are tagged with both editcheck-tone and editcheck-tone-shown.
Code
# load data for assessing edit reject frequencyedit_check_reject_data <-read.csv(file ='Queries/data/edit_check_rejects_data.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
Code
# Set experience level group and factor levelsedit_check_reject_data <- edit_check_reject_data %>%mutate(experience_level_group =case_when( user_edit_count ==0& user_status =='registered'~'Newcomer', user_edit_count ==0& user_status =='unregistered'~'Unregistered', user_edit_count >0& user_edit_count <=100~"Junior Contributor", user_edit_count >100~"Non-Junior Contributor" ),experience_level_group =factor(experience_level_group,levels =c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor") )) #rename experiment field to clarifyedit_check_reject_data <- edit_check_reject_data %>%mutate(test_group =factor(test_group,levels =c('2025-09-editcheck-tone-control', '2025-09-editcheck-tone-test'),labels =c("control (no tone check)", "test (tone check available)")))
Code
#Set fields and factor levels to assess number of checks shown#Note limited to 1 sidebar open as we're looking for cases where multiple checks presented in a single sidebar (vs user going back and forth)edit_check_reject_data <- edit_check_reject_data %>%mutate(multiple_checks_shown =ifelse(n_checks_shown >1, "multiple checks shown", "single check shown"), multiple_checks_shown =factor( multiple_checks_shown ,levels =c("single check shown", "multiple checks shown")))# note these buckets can be adjusted as needed based on distribution of dataedit_check_reject_data <- edit_check_reject_data %>%mutate(checks_shown_bucket =case_when(is.na(n_checks_shown) ~'0', n_checks_shown ==1~'1', n_checks_shown ==2~'2', n_checks_shown >2& n_checks_shown <=5~"3-5", n_checks_shown >5& n_checks_shown <=10~"6-10", n_checks_shown >10~"over 10" ),checks_shown_bucket =factor(checks_shown_bucket ,levels =c("0","1","2", "3-5", "6-10", "over 10") ))
Overall published edits where tone check was dismissed
Code
edit_check_dismissal_overall <- edit_check_reject_data %>%filter(was_edit_check_shown ==1) %>%#limit to where shownsummarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Proportion of edits where at least one tone check was dismissed" ) %>%opt_stylize(5) %>%cols_label(n_edits ="Number of edits shown tone check",n_rejects ="Number of edits that dimisssed tone check",dismissal_rate ="Proportion of edits where tone check was dismissed" ) display_html(as_raw_html(edit_check_dismissal_overall ))
Proportion of edits where at least one tone check was dismissed
Number of edits shown tone check
Number of edits that dimisssed tone check
Proportion of edits where tone check was dismissed
287
164
57.1%
Overall published edits that still have tone issue at save after being shown tone check
Code
tone_check_issues_remain_overall <- edit_check_reject_data %>%filter(was_edit_check_shown ==1) %>%#limit to where shownsummarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[is_tone_check_eligible ==1])) %>%#limit to edits identifed as having tone issue at time of savemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Proportion of edits still identifed as having non-neutral language after being shown tone check" ) %>%opt_stylize(5) %>%cols_label(n_edits ="Number of edits shown tone check",n_rejects ="Number of edits still identified as eligible",dismissal_rate ="Proportion of edits with non-neutral language" ) display_html(as_raw_html(tone_check_issues_remain_overall ))
Proportion of edits still identifed as having non-neutral language after being shown tone check
Number of edits shown tone check
Number of edits still identified as eligible
Proportion of edits with non-neutral language
287
184
64.1%
By if multiple checks shown
Code
edit_check_dismissal_bymultiple <- edit_check_reject_data %>%filter(was_edit_check_shown ==1) %>%#limit to where showngroup_by(multiple_checks_shown) %>%summarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Proportion of edits where at least one tone check was dismissed by if multiple checks shown" ) %>%opt_stylize(5) %>%cols_label(multiple_checks_shown ="Multiple Checks",n_edits ="Number of edits shown tone check",n_rejects ="Number of edits that dimisssed tone check",dismissal_rate ="Proportion of edits where tone check was dismissed" ) %>%tab_source_note( gt::md('Limited to published edits where at least one tone check was shown and dismissed') )display_html(as_raw_html(edit_check_dismissal_bymultiple ))
Proportion of edits where at least one tone check was dismissed by if multiple checks shown
Multiple Checks
Number of edits shown tone check
Number of edits that dimisssed tone check
Proportion of edits where tone check was dismissed
single check shown
104
71
68.3%
multiple checks shown
183
94
51.4%
Limited to published edits where at least one tone check was shown and dismissed
By platform
Code
edit_check_dismissal_byplatform <- edit_check_reject_data %>%filter(was_edit_check_shown ==1) %>%#limit to where showngroup_by(platform) %>%summarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%ungroup() %>%mutate(n_edits =ifelse(n_edits <50, "<50", n_edits),n_rejects =ifelse(n_rejects <50, "<50", n_rejects)) %>%#sanitizing per data publication guidelinesselect(-2) %>%gt() %>%tab_header(title ="Proportion of edits where at least one tone check was dismissed by number of checks shown" ) %>%opt_stylize(5) %>%cols_label(platform ="Platform",#n_edits = "Number of edits shown tone check",n_rejects ="Number of edits that dimisssed tone check",dismissal_rate ="Proportion of edits where tone check was dismissed" ) display_html(as_raw_html(edit_check_dismissal_byplatform ))
Proportion of edits where at least one tone check was dismissed by number of checks shown
Platform
Number of edits that dimisssed tone check
Proportion of edits where tone check was dismissed
desktop
136
63.8%
phone
<50
39.2%
By user experience
Code
edit_check_dismissal_byuserexp <- edit_check_reject_data %>%filter(was_edit_check_shown ==1) %>%#limit to where showngroup_by(experience_level_group) %>%summarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%ungroup() %>%mutate(n_edits =ifelse(n_edits <50, "<50", n_edits),n_rejects =ifelse(n_rejects <50, "<50", n_rejects)) %>%#sanitizing per data publication guidelinesselect(-2) %>%gt() %>%tab_header(title ="Proportion of edits where at least one tone check was dismissed by user experience" ) %>%opt_stylize(5) %>%cols_label(experience_level_group ="User Experience",#n_edits = "Number of edits shown tone check",n_rejects ="Number of edits that dimisssed tone check",dismissal_rate ="Proportion of edits where tone check was dismissed" )display_html(as_raw_html(edit_check_dismissal_byuserexp ))
Proportion of edits where at least one tone check was dismissed by user experience
User Experience
Number of edits that dimisssed tone check
Proportion of edits where tone check was dismissed
Unregistered
<50
55.7%
Newcomer
<50
62.3%
Junior Contributor
84
57.5%
By partner Wikipedia
Code
edit_check_dismissal_bywiki <- edit_check_reject_data %>%filter(was_edit_check_shown ==1) %>%#limit to where showngroup_by(wiki) %>%summarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%ungroup() %>%mutate(n_edits =ifelse(n_edits <50, "<50", n_edits),n_rejects =ifelse(n_rejects <50, "<50", n_rejects)) %>%#sanitizing per data publication guidelinesselect(-2) %>%gt() %>%tab_header(title ="Proportion of edits where at least one tone check was dismissed by partner Wikipedia" ) %>%opt_stylize(5) %>%cols_label(wiki ="Wikipedia",#n_edits = "Number of edits shown tone check",n_rejects ="Number of edits that dimisssed tone check",dismissal_rate ="Proportion of edits where tone check was dismissed" ) display_html(as_raw_html(edit_check_dismissal_bywiki ))
Proportion of edits where at least one tone check was dismissed by partner Wikipedia
Wikipedia
Number of edits that dimisssed tone check
Proportion of edits where tone check was dismissed
frwiki
135
63.1%
jawiki
<50
40.4%
ptwiki
<50
46.2%
Key Insights
A little over half of all published edits where tone check was shown (57%) included at least one check that the user dismissed. This is similar to the rates observed for Reference Check.
64% of published edits shown tone check were still identified as having non-neutral language at the time they were published.
Tone checks are dismissed more frequently on desktop compared to mobile. 63.8% of all published desktop edits where tone check was shown include at least one check that was dismissed compared to 39% of all published mobile edits.
Newcomers are slightly more likely to dismiss a tone check compared to unregistered or Junior Contributors. 62.3% of all published edits by newcomers included at least one dismissal of a tone check compared to 57.5% of edits by Junior Contributors.
Dismissal rates are also currently higher at French Wikipedia compared to Japanese and Portuguese Wikipedia.
Proportion of published new content edits that are reverted within 48 hours
Question:Is tone check causing any disruption?
Methdology: Reviewed the proportion of all published new content edits where tone check was shown at least once in an editing session (identified by editCheck-tone-shown tag) and were reverted within 48 hours. This was compared to the revert rate of edits in the control group identifed as eligible for tone check (identified by editcheck-tone tag).
Code
# load data for assessing tone check published dataedit_check_save_data <-read.csv(file ='Queries/data/edit_check_saves_data.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
Code
# Set experience level group and factor levelsedit_check_save_data <- edit_check_save_data %>%mutate(experience_level_group =case_when( user_edit_count ==0& user_status =='registered'~'Newcomer', user_edit_count ==0& user_status =='unregistered'~'Unregistered', user_edit_count >0& user_edit_count <=100~"Junior Contributor", user_edit_count >100~"Non-Junior Contributor" ),experience_level_group =factor(experience_level_group,levels =c("Unregistered","Newcomer", "Non-Junior Contributor", "Junior Contributor") )) #rename experiment field to clarifyedit_check_save_data <- edit_check_save_data %>%mutate(test_group =factor(test_group,levels =c('2025-09-editcheck-tone-control', '2025-09-editcheck-tone-test'),labels =c("control (no tone check shown)", "test (tone check shown)")))
Code
# set field to indicate if more than one check was shown in a single session. Note: This should only be applicable to the test group edit_check_save_data <- edit_check_save_data %>%mutate(multiple_checks_shown =ifelse(n_checks_shown >1, "multiple checks shown", "single check shown"), multiple_checks_shown =factor( multiple_checks_shown ,levels =c("single check shown", "multiple checks shown")))# note these buckets can be adjusted as needed based on distribution of dataedit_check_save_data <- edit_check_save_data %>%mutate(checks_shown_bucket =case_when(is.na(n_checks_shown) ~'0', n_checks_shown ==1~'1', n_checks_shown ==2~'2', n_checks_shown >2& n_checks_shown <=5~"3-5", n_checks_shown >5& n_checks_shown <=10~"6-10", n_checks_shown >10~"over 10" ),checks_shown_bucket =factor(checks_shown_bucket ,levels =c("0","1","2", "3-5", "6-10","over 10") ))
Code
# define set of all eligible edits to review (eligible in control and activated in test)edit_check_save_data <- edit_check_save_data %>%mutate(is_test_eligible =ifelse( (test_group =='test (tone check shown)'& was_tone_check_shown_tag ==1) | (test_group =='control (no tone check shown)'& is_tone_check_eligible ==1) , 'eligible', 'not eligible'),is_test_eligible =factor( is_test_eligible,levels =c("eligible", "not eligible" ) ))
Code
# use tone check eligible tag to define edits that were detected as having non-netural languageedit_check_save_data <- edit_check_save_data %>%mutate(is_tone_check_eligible =ifelse(is_tone_check_eligible ==1, 'non-neutral language detected', 'tone check addressed'),is_tone_check_eligible =factor( is_tone_check_eligible,levels =c("non-neutral language detected", "tone check addressed" ) ))
Overall by experiment group
Code
edit_check_save_overall <- edit_check_save_data %>%filter(is_new_content ==1& is_test_eligible =='eligible') %>%#limit to eligible editsgroup_by(test_group) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#limit to new content edits without a refernecemutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(2,3)) %>%# removing granular data columns for publicationgt() %>%tab_header(title ="New content edit revert rate by experiment group" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",#n_edits = "Number of published edits shown tone check",#n_reverts = "Number of edits reverted",revert_rate ="Proportion of new content edits that were reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits shown or eligible to shown tone check') )display_html(as_raw_html(edit_check_save_overall ))
New content edit revert rate by experiment group
Test Group
Proportion of new content edits that were reverted
control (no tone check shown)
24.8%
test (tone check shown)
21.7%
Limited to published new content edits shown or eligible to shown tone check
By if mulitiple checks were shown
Code
edit_check_revert_bymultiple <- edit_check_save_data %>%filter(is_new_content ==1& is_test_eligible =='eligible'& test_group =='test (tone check shown)'&!is.na(multiple_checks_shown)) %>%#limit to eligible edits and removing 2 abonormal test instance tagged as eligible not shown checkgroup_by( multiple_checks_shown) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#limit to new content edits without a refernecemutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(2,3)) %>%# removing granular data columns for publicationgt() %>%tab_header(title ="New content edit revert rate by if multiple checks were shown" ) %>%opt_stylize(5) %>%cols_label(multiple_checks_shown ="Multiple Check",#n_edits = "Number of published new content edits",#n_reverts = "Number of edits reverted ",revert_rate ="Proportion of new content edits that were reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits shown or eligible to shown tone check') )display_html(as_raw_html(edit_check_revert_bymultiple ))
New content edit revert rate by if multiple checks were shown
Multiple Check
Proportion of new content edits that were reverted
single check shown
13.8%
multiple checks shown
24.7%
Limited to published new content edits shown or eligible to shown tone check
By Platform
Code
edit_check_revert_byplatform <- edit_check_save_data %>%filter(is_new_content ==1& is_test_eligible =='eligible') %>%group_by( platform, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#limit to new content edits without a refernecemutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(3,4)) %>%# removing granular data columns for publicationgt() %>%tab_header(title ="New content edit revert rate by platform" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",platform ="Platform",#n_edits = "Number of published new content edits",#n_reverts = "Number of edits reverted",revert_rate ="Proportion of new content edits that were reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits shown or eligible to shown tone check') )display_html(as_raw_html(edit_check_revert_byplatform ))
New content edit revert rate by platform
Test Group
Proportion of new content edits that were reverted
desktop
control (no tone check shown)
20.7%
test (tone check shown)
19.6%
phone
control (no tone check shown)
35.5%
test (tone check shown)
28.6%
Limited to published new content edits shown or eligible to shown tone check
By user experience
Code
edit_check_revert_byuserexp <- edit_check_save_data %>%filter(is_new_content ==1& is_test_eligible =='eligible') %>%group_by(experience_level_group,test_group ) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#limit to new content edits without a refernecemutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(3,4)) %>%# removing granular data columns for publicationgt() %>%tab_header(title ="New content edit revert rate by user experience" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiement Group",experience_level_group ="User Status",#n_edits = "Number of published new content edits",#n_reverts = "Number of edits reverted",revert_rate ="Proportion of new content edits that were reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits shown or eligible to shown tone check') )display_html(as_raw_html(edit_check_revert_byuserexp))
New content edit revert rate by user experience
Experiement Group
Proportion of new content edits that were reverted
Unregistered
control (no tone check shown)
23.9%
test (tone check shown)
23.3%
Newcomer
control (no tone check shown)
11.8%
test (tone check shown)
30.8%
Junior Contributor
control (no tone check shown)
30%
test (tone check shown)
15.7%
Limited to published new content edits shown or eligible to shown tone check
By partner Wikipedia
Code
edit_check_revert_bywiki <- edit_check_save_data %>%filter(is_new_content ==1& is_test_eligible =='eligible' ) %>%group_by( wiki, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#limit to new content edits without a refernecemutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(3,4)) %>%# removing granular data columns for publicationgt() %>%tab_header(title ="New content edit revert rate by partner Wikipedia" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",wiki ="Wikipedia",#n_edits = "Number of published new content edits",#n_reverts = "Number of edits reverted",revert_rate ="Proportion of new content edits that were reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits shown or eligible to shown tone check') )display_html(as_raw_html(edit_check_revert_bywiki))
New content edit revert rate by partner Wikipedia
Experiment Group
Proportion of new content edits that were reverted
frwiki
control (no tone check shown)
24.2%
test (tone check shown)
21.3%
jawiki
control (no tone check shown)
13.3%
test (tone check shown)
22.2%
ptwiki
control (no tone check shown)
57.1%
test (tone check shown)
23.1%
Limited to published new content edits shown or eligible to shown tone check
Key Insights
There have been no significant changes in the revert rate of new content edits overall or by platform or Wikipedia. However, we’ve observed decreases in revert rate when limiting to edits where tone check was shown or eligible to be shown.
Overall, there has been a -3% decrease in the revert rate of published edits when tone check was shown compared to eligible edits in the control group.
We’ve observed a -5.3% decrease in the revert rate of desktop edits where tone check was shown and -19% decrease in the revert rate of mobile edits.
More data is needed to confirm per Wikipedia and per experience level trends.
Revert rate of published edits identifed by model as having non-neutral language
In the above revert rate analysis section, we reviewed the overall revert rate of all published edits shown tone check but do not consider how many of those edits revised the text to address any problematic language prior to publishing their text.
In this analysis, we review the revert rate of edits shown tone check by if the saved edit still included non-neutral language. This is to check our hypothesis that addressing tone issues identified in text will decrease the likelihood that newcomers edits will be reverted. To complete this analysis, we used the revision tag created in T388716 to identify when the model detects non-neutral language within new content edit.
Overall
Code
tone_check_eligible_revert_overall <- edit_check_save_data %>%filter( is_test_eligible =='eligible', test_group =='test (tone check shown)') %>%#limit to edits where edit check was shown group_by(is_tone_check_eligible) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#look at revertedmutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(2,3)) %>%# removing granular data columnsgt() %>%tab_header(title ="Revert rate of edits by if non-neutral language was detected at time of publishing" ) %>%opt_stylize(5) %>%cols_label(is_tone_check_eligible ="Were tone issues detected at time of save?",#n_edits = "Number of published edits",#n_reverts = "Number of edits reverted",revert_rate ="Proportion of edits that were reverted" ) %>%tab_source_note( gt::md('Limited to edits shown tone check in the test group') )display_html(as_raw_html(tone_check_eligible_revert_overall))
Revert rate of edits by if non-neutral language was detected at time of publishing
Were tone issues detected at time of save?
Proportion of edits that were reverted
non-neutral language detected
27.4%
tone check addressed
12.1%
Limited to edits shown tone check in the test group
By Platform
Code
tone_check_eligible_revert_byplatform <- edit_check_save_data %>%filter( is_test_eligible =='eligible', test_group =='test (tone check shown)') %>%#limit to edits where edit check was showngroup_by(platform, is_tone_check_eligible) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#look at revertedmutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(3,4)) %>%# removing granular data columnsgt() %>%tab_header(title ="Revert rate of edits by if non-neutral language was detected at time of publishing" ) %>%opt_stylize(5) %>%cols_label(platform ="Platform",is_tone_check_eligible ="Were tone issues detected at time of save?",#n_edits = "Number of published edits",#n_reverts = "Number of edits reverted",revert_rate ="Proportion of edits that were reverted" ) %>%tab_source_note( gt::md('Limited to edits shown tone check in the test group') )display_html(as_raw_html(tone_check_eligible_revert_byplatform))
Revert rate of edits by if non-neutral language was detected at time of publishing
Were tone issues detected at time of save?
Proportion of edits that were reverted
desktop
non-neutral language detected
23.2%
tone check addressed
8.1%
phone
non-neutral language detected
36.1%
tone check addressed
29.4%
Limited to edits shown tone check in the test group
By User Experience
Code
tone_check_eligible_revert_byuserexp <- edit_check_save_data %>%filter( is_test_eligible =='eligible', test_group =='test (tone check shown)') %>%#limit to edits where edit check was showngroup_by(experience_level_group, is_tone_check_eligible) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#look at revertedmutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(3,4)) %>%# removing granular data columnsgt() %>%tab_header(title ="Revert rate of edits shown or eligible to shown tone check by if tone issues were detected at time of publishing" ) %>%opt_stylize(5) %>%cols_label(experience_level_group ="User Experience",is_tone_check_eligible ="Were tone issues detected at time of save?",#n_edits = "Number of published edits",#n_reverts = "Number of edits reverted",revert_rate ="Proportion of edits that were reverted" ) %>%tab_source_note( gt::md('Limited to edits shown tone check in the test group') )display_html(as_raw_html(tone_check_eligible_revert_byuserexp))
Revert rate of edits shown or eligible to shown tone check by if tone issues were detected at time of publishing
Were tone issues detected at time of save?
Proportion of edits that were reverted
Unregistered
non-neutral language detected
34.4%
tone check addressed
15%
Newcomer
non-neutral language detected
41.9%
tone check addressed
20%
Junior Contributor
non-neutral language detected
18.1%
tone check addressed
7.8%
Limited to edits shown tone check in the test group
By Partner Wikipedia
Code
tone_check_eligible_revert_bywiki <- edit_check_save_data %>%filter( is_test_eligible =='eligible', test_group =='test (tone check shown)') %>%#limit to edits where edit check was showngroup_by(wiki, is_tone_check_eligible) %>%summarise(n_edits =n_distinct(editing_session),n_reverts =n_distinct(editing_session[was_reverted ==1])) %>%#look at revertedmutate(revert_rate =paste0(round(n_reverts/n_edits *100, 1), "%")) %>%select(-c(3,4)) %>%# removing granular data columnsgt() %>%tab_header(title ="Revert rate of edits shown or eligible to shown tone check by if tone issues were detected at time of publishing" ) %>%opt_stylize(5) %>%cols_label(wiki ="Wikipedia",is_tone_check_eligible ="Were tone issues detected at time of save?",#n_edits = "Number of published edits",#n_reverts = "Number of edits reverted",revert_rate ="Proportion of edits that were reverted" ) %>%tab_source_note( gt::md('Limited to edits shown tone check in the test group') )display_html(as_raw_html(tone_check_eligible_revert_bywiki))
Revert rate of edits shown or eligible to shown tone check by if tone issues were detected at time of publishing
Were tone issues detected at time of save?
Proportion of edits that were reverted
frwiki
non-neutral language detected
25.7%
tone check addressed
14%
jawiki
non-neutral language detected
33.3%
tone check addressed
10%
ptwiki
non-neutral language detected
40%
tone check addressed
7.1%
Limited to edits shown tone check in the test group
Key Insights
Edits where text was revised to address the tone check shown were 2x less likely to be reverted.
On mobile, edits where tone checks were addressed are 13% less likely to be reverted and on desktop edits where tone check was address are almost 3x less likely to be reverted.
Decreases were observed on across all user experiences and partner Wikipedias; however, more data is needed to confirm trends for these breakdowns as there is still limited published edits on a per wiki or user experience.
Proportion of people blocked after publishing an edit where Multi Check was shown
Question:Is tone check causing any disruption?
Methodology: We gathered all edits where edit check was shown from the mediawiki_revision_change_tag table and joined with mediawiki_private_cu_changes to gather user name info. We then reviewed both global and local blocks made within 6 hours of the tone check event as identified in the logging table.
Code
# load data for assessing blocksedit_check_blocks <-read.csv(file ='Queries/data/edit_check_eligible_users_blocked.csv',header =TRUE,sep =",",stringsAsFactors =FALSE )
Code
#rename experiment field to clarifyedit_check_blocks <- edit_check_blocks%>%mutate(test_group =factor(bucket,levels =c('2025-09-editcheck-tone-control', '2025-09-editcheck-tone-test'),labels =c("control (no tone check)", "test (tone check available)")))
Code
edit_check_local_blocks_overall <- edit_check_blocks %>%#filter(user_id == 0) %>%group_by(test_group) %>%summarise(blocked_users =n_distinct(ip[is_local_blocked =='True'| is_global_blocked =='True']),all_users =n_distinct(ip)) %>%#look at blocksmutate(prop_blocks =paste0(round(blocked_users/all_users *100, 1), "%")) %>%select(-c(2,3)) %>%#removing granular data columns gt() %>%tab_header(title ="Proportion of users blocked by experiment group" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",prop_blocks ="Proportion of users blocked" ) %>%tab_source_note( gt::md('Limited to users blocked 6 hours after publishing an edit where tone check was shown') )display_html(as_raw_html(edit_check_local_blocks_overall))
Proportion of users blocked by experiment group
Test Group
Proportion of users blocked
test (tone check available)
0.9%
Limited to users blocked 6 hours after publishing an edit where tone check was shown
Key Insights
0.9% of all users were blocked after publishing an edit where at least one tone check was shown compared to 0% in the control group. This difference is not statistically significant.
No global blocks were issued to any users that published an edit where at least one tone check was shown.
Proportion of edits that are published before the model is able to return an evaluation
Question: Is the model not able to evaluate tone of published edit quickly enough?
Methdology: In T388716, we added instrumentation (feature: editCheck-tone, action: save-before-check-finalized ) to indicate an edit was published before the model returned an evaluation. These events would not have the editcheck-tone tag applied to indicate if the published edit includes promotional language.
For this analysis, we reviewed the proportion of all published edits in each test group where this event was logged to determine how frequently this is occuring.
Overall by Experiment Group
Code
saves_before_rate_overall <- edit_check_save_data %>%group_by(test_group) %>%summarise(n_edits =n_distinct(editing_session),n_prior_saves =n_distinct(editing_session[saves_before_finalized ==1])) %>%mutate(saves_before_rate =paste0(round(n_prior_saves /n_edits *100, 1), "%")) %>%mutate( n_prior_saves =ifelse( n_prior_saves <50, "<50", n_prior_saves ))%>%#sanitizing per data publication guidelineselect(-2) %>%gt() %>%tab_header(title ="Edits published before the model returns an evaluation" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",#n_edits = "Number of published edits",n_prior_saves ="Number of edits published before check finalized",saves_before_rate ="Proportion of edits published before check finalized" ) display_html(as_raw_html(saves_before_rate_overall))
Edits published before the model returns an evaluation
Test Group
Number of edits published before check finalized
Proportion of edits published before check finalized
control (no tone check shown)
249
1.2%
test (tone check shown)
<50
0.1%
By platform
Code
saves_before_rate_byplatform <- edit_check_save_data %>%group_by(platform, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_prior_saves =n_distinct(editing_session[saves_before_finalized ==1])) %>%mutate(saves_before_rate =paste0(round(n_prior_saves /n_edits *100, 1), "%")) %>%mutate( n_prior_saves =ifelse( n_prior_saves <50, "<50", n_prior_saves ))%>%#sanitizing per data publication guidelineselect(-3) %>%gt() %>%tab_header(title ="Edits published before the model returns an evaluation by platform" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",platform ="Platform",#n_edits = "Number of published edits",n_prior_saves ="Number of edits published before check finalized",saves_before_rate ="Proportion of edits published before check finalized" ) display_html(as_raw_html(saves_before_rate_byplatform))
Edits published before the model returns an evaluation by platform
Test Group
Number of edits published before check finalized
Proportion of edits published before check finalized
desktop
control (no tone check shown)
166
1.3%
test (tone check shown)
<50
0.1%
phone
control (no tone check shown)
83
1%
test (tone check shown)
<50
0%
By partner Wikipedia
Code
saves_before_rate_bywiki <- edit_check_save_data %>%group_by(wiki, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_prior_saves =n_distinct(editing_session[saves_before_finalized ==1])) %>%mutate(saves_before_rate =paste0(round(n_prior_saves /n_edits *100, 1), "%")) %>%mutate( n_prior_saves =ifelse( n_prior_saves <50, "<50", n_prior_saves ))%>%#sanitizing per data publication guidelineselect(-3) %>%gt() %>%tab_header(title ="Edits published before the model returns an evaluation by Wikipedia" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",wiki ="Wikipedia",#n_edits = "Number of published edits",n_prior_saves ="Number of edits published before check finalized",saves_before_rate ="Proportion of edits published before check finalized" )display_html(as_raw_html(saves_before_rate_bywiki))
Edits published before the model returns an evaluation by Wikipedia
Test Group
Number of edits published before check finalized
Proportion of edits published before check finalized
frwiki
control (no tone check shown)
71
0.6%
test (tone check shown)
<50
0.1%
jawiki
control (no tone check shown)
156
2.1%
test (tone check shown)
<50
0.1%
ptwiki
control (no tone check shown)
<50
0.8%
test (tone check shown)
<50
0%
Key Insights
About 0.6% of all pubished edits (264 edits) in the AB test were saved before the model returned an evaluation.
The majority of these edits occured in the control group and on desktop.
This occurs very infrequently in the test group. Only 0.1% of all published edits in the test group (all on desktop) were saved before the model returned an evaluation. This is expected based on comments documented in T388716#10911327.
We’ve observed a slightly higher rate of this occurring at Japanese Wikipedia compared to the other two partner Wikipedias.