Megan Neisler
November 22, 2019
As part of efforts to improve the mobile contribution experience, the Web team deployed the Advanced Mobile Contributions (AMC) mode. This is an opt-in feature set that adds more contributor capabilities to the mobile web experience. Please see the (project page) for more details about the background, changes and goals of the project.
The feature was was first deployed as an opt-in setting to identified target wikis including Arabic, Indonesian, Spanish, Italian, Japanese, Persian, and Thai Wikipedias due to their relatively large populations of existing mobile editors. After testing and feedback, AMC for deployed and promoted the feature set to all Wikimedia projects.
Timeline
This report shows the status of the key performance indicators (KPIs) identified in the Annual Plan following the deployment of AMC to all Wikimedia Projects. Results from first progress report, showing the status of the KPIs as of the end of FY18-19 (June 2019), are available in the FY 2018/2019 AMC Metrics Status Report. We reviewed metrics overall, on english wikipedia, and on all of the target Wikipedia projects.
In the annual plan, the Readers Web team defined the following KPIs:
Mobile web edit rate.
Retention rate for opt-in advanced mobile mode amongst medium and high-volume editors (100+ edits (medium-volume), 500+ edits (high volume) )
Moderation actions on mobile web
Other Metrics
For more links to implementation tasks and technical details, see this overview task T210660
library(IRdisplay)
display_html(
'<script>
code_show=true;
function code_toggle() {
if (code_show){
$(\'div.input\').hide();
} else {
$(\'div.input\').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()">
<input type="submit" value="Click here to toggle on/off the raw code.">
</form>'
)
vertical_lines <- as.numeric(as.Date(c("2019-03-20", "2019-06-17", "2019-08-07")))
The mobile web edit rate is based on edits on all Wikipedias between June 1, 2018 through September 30, 2019 recorded in the mediawiki_history dataset.
We reviewed the both the rate of overall mobile web edits and the rate of edits on mobile web coming from AMC mode (Measured using the an AMC edits tag (done in T212959 on January 16, 2016).
At the time of this analysis, we did not review any changes on non-Wikipedia projects since AMC was not deployed on those projects until October 10, 2019.
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
library(magrittr); library(zeallot); library(glue); library(tidyverse); library(zoo); library(lubridate)
library(scales)
})
#Collect all mobile web edits and mobile web edits tagged as AMC from all target wikis where AMC was deployed
#grouped by wiki and user edit count
# In terminal
# spark2R --master yarn --executor-memory 2G --executor-cores 1 --driver-memory 4G
query <- "SELECT
date_format(event_timestamp, 'yyyy-MM-dd') as date,
wiki,
user_edit_count,
sum(cast(other_mobile_web_edit as int)) as other_mobile_web_edits,
sum(cast(amc_edit as int)) as amc_edits,
sum(cast(mobile_web_edit as int)) as mobile_web_edits
FROM (
SELECT
wiki_db as wiki,
event_timestamp,
(array_contains(revision_tags, 'mobile web edit') and not
array_contains(revision_tags, 'advanced mobile edit')) as other_mobile_web_edit,
(array_contains(revision_tags, 'advanced mobile edit') and
array_contains(revision_tags, 'mobile web edit')) as amc_edit,
array_contains(revision_tags, 'mobile web edit') as mobile_web_edit,
CASE
WHEN event_user_revision_count is NULL THEN 'undefined'
WHEN event_user_revision_count < 100 THEN 'under 100'
WHEN event_user_revision_count >=100 AND event_user_revision_count < 500 THEN '100-499'
ELSE '500+'
END AS user_edit_count
FROM wmf.mediawiki_history mwh
INNER JOIN canonical_data.wikis cd
ON wiki_db = database_code
WHERE
mwh.event_entity = 'revision' and
mwh.event_type = 'create' and
cd.database_group = 'wikipedia' and
mwh.event_timestamp IS NOT NULL and
mwh.event_timestamp between '2018-06-01' and '2019-09-30' and
mwh.snapshot = '2019-10'
) edits
GROUP BY wiki, date_format(event_timestamp, 'yyyy-MM-dd'), user_edit_count"
results <- collect(sql(query))
save(results, file="Readers-Web-AMC-metrics/Data/mobile_web_edit_counts.RData")
load("Data/mobile_web_edit_counts.RData")
mobile_web_edit_counts <- results
mobile_web_edit_counts$date <- as.Date(mobile_web_edit_counts$date, format = "%Y-%m-%d")
mobile_web_edit_counts_clean <- mobile_web_edit_counts %>%
gather(edit_type, edit_count, 4:6) %>%
arrange(desc(date))
##Overall monthly web edit counts and yoy change
mobile_web_edit_monthly_overall_yoy <- mobile_web_edit_counts_clean %>%
filter(edit_type == 'mobile_web_edits') %>% # filter to all overall mobile web edits (both AMC and non-AMC)
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count, na.rm = TRUE)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
tail(mobile_web_edit_monthly_overall_yoy)
##Plot time series of mobile edits rate.
p <- mobile_web_edit_monthly_overall_yoy %>%
ggplot(aes(x=date, y = total_mobile_edits, color = edit_type)) +
geom_line(color = 'blue', size = 1.5 ) +
geom_vline(xintercept = vertical_lines,
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-03-20'), y=1.15E6, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.7, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-06-17'), y=1.15E6, label="AMC deployed on Italian, Japanese, Persian, and Thai Wikipedias"), size=3.7, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=1.15E6, label="AMC deployed on all Wikipedias"), size=3.7, vjust = -1.2, angle = 90, color = "black") +
scale_y_continuous("Number of mobile web edits per month", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Mobile web edits on all Wikipedia projects") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"))
p
ggsave("Figures/mobile_web_edits_overall_monthly.png", p, width = 18, height = 9, units = "in", dpi = 150)
There was a steady increase in total mobile web edits the past 15 months, which has been occuring prior to the deployment of AMC on target wikis. This is likely partly due to a sustained increase in overall active editors. In September 2019, there was a 22.7% year over year increase in overall mobile web edits. We reviewed the number of these edits made while the user was in AMC mode to determine the impact of AMC on this increase.
Since August deployment Date through September 2019
# across all Wikipedias for logged in users since August deployment Date
amc_edits_prop_overall <- mobile_web_edit_counts %>%
filter(date >= "2019-08-07", #deployment date across all wikis
user_edit_count != 'undefined') %>% #limit only to logged in users
summarise(mobile_web_edits = sum(mobile_web_edits, na.rm = TRUE),
amc_edits = sum(amc_edits, na.rm = TRUE)) %>%
mutate(amc_prop = amc_edits/mobile_web_edits *100)
amc_edits_prop_overall
Per Month Since Deployment
# across all Wikipedias for logged in users per month since deployment
amc_edits_prop_overall_bymonth <- mobile_web_edit_counts %>%
filter(user_edit_count != 'undefined',
date >= '2019-03-01')%>% #since first March deployment
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(mobile_web_edits = sum(mobile_web_edits, na.rm = TRUE),
amc_edits = sum(amc_edits, na.rm = TRUE)) %>%
mutate(prop = amc_edits/mobile_web_edits *100)
amc_edits_prop_overall_bymonth
#Plot yoy of mobile web edits
p <- mobile_web_edit_counts_clean %>%
filter(date >= '2019-03-17', #date of first deployment to target wikis
date <= '2019-09-28', #remove last week due to incomplete data
edit_type != 'mobile_web_edits',
user_edit_count != 'undefined') %>%
mutate(date = floor_date(date, "week")) %>%
group_by(date, edit_type) %>%
summarise(total_mobile_edits = sum(edit_count, na.rm = TRUE)) %>%
ggplot(aes(x= date, y= total_mobile_edits, fill = edit_type)) +
geom_col() +
geom_vline(xintercept = vertical_lines,
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-03-20'), y=80E3, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.6, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-06-17'), y=80E3, label="AMC deployed on Italian, Japanese, Persian, and Thai Wikipedias"), size=3.6, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=80E3, label="AMC deployed on all Wikipedias"), size=3.6, vjust = -1.2, angle = 90, color = "black") +
scale_y_continuous("Number of edits per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "2 weeks") +
labs(title = "Mobile web edits on all Wikipedia projects by edit mode") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_edits_overall_amc_prop.png", p, width = 18, height = 9, units = "in", dpi = 150)
The proportion of AMC tagged mobile web edits have increased following each wiki deployment. Since deployment of AMC on all Wikipedias on August 7, 2019, 17.3% of all mobile web edits by logged-in users were made while in AMC mode. In September 2019, 26% of all mobile web edits were made while in AMC mode.
## Plot of overall mobile web edit rate by user edit count
p <- mobile_web_edit_counts_clean %>%
filter(edit_type == 'mobile_web_edits', #look at all mobile web edits
user_edit_count != 'undefined', #remove undefined user edit counts
) %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date, user_edit_count) %>%
summarise(total_mobile_edits = sum(edit_count, na.rm = TRUE))%>%
ggplot(aes(x=date, y = total_mobile_edits, color = user_edit_count, linetype = user_edit_count)) +
geom_line(size = 1.5)+
geom_text(aes(x=as.Date('2019-03-20'), y=200E3, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.5, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-06-17'), y=200E3, label="AMC deployed on Italian, Japanese, Persian, and Thai Wikipedias"), size=3.5, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=200E3, label="AMC deployed on all Wikipedias"), size=3.5, vjust = -1.2, angle = 90, color = "black") +
scale_y_continuous("Number of edits per month", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "2 months", limits = c()) +
labs(title = "Mobileweb edits by user experience on all Wikipedia project") +
geom_vline(xintercept = vertical_lines,
linetype = "dashed", color = "black") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="bottom")
ggsave("Figures/mobile_web_edits_byeditcount.png",p, width = 18, height = 9, units = "in", dpi = 150)
p
##Calculate overall YOY increase for 100+ and 500+ editors across all wikis
mobile_web_edit_under100 <- mobile_web_edit_counts_clean %>%
filter(user_edit_count == 'under 100') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count, na.rm = TRUE)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
mobile_web_edit_100 <- mobile_web_edit_counts_clean %>%
filter(user_edit_count == '100-499') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count, na.rm = TRUE)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
mobile_web_edit_500 <- mobile_web_edit_counts_clean %>%
filter(user_edit_count == '500+') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count, na.rm = TRUE)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
Overall year over year change in mobile web edits by editor experience
# yoy table
edit_count <- c('under 100', '100+', '500+')
mobile_web_editcount_yoy <- rbind(mobile_web_edit_under100[16,], mobile_web_edit_100[16,], mobile_web_edit_500[16,])
mobile_web_editcount_yoy$edit_count= edit_count
mobile_web_editcount_yoy
In September 2019, there was a year over year increase in mobile web edits for all user edit count groups but the highest increase (44%) was seen for logged-in editors with over 500 cumulative edits on Wikipedia projects. This is an increase from the 37% year over year increase in June 2019 for the same editor group.
Since August deployment Date through September 2019
amc_edits_prop_overall_byeditcount <- mobile_web_edit_counts %>%
filter(date >= "2019-08-07", #deployment date across all wikis
user_edit_count != 'undefined') %>%
group_by(user_edit_count) %>%
summarise(mobile_web_edits = sum(mobile_web_edits, na.rm = TRUE),
amc_edits = sum(amc_edits, na.rm = TRUE)) %>%
mutate(prop = amc_edits/mobile_web_edits *100)
amc_edits_prop_overall_byeditcount
**In September 2019
amc_edits_prop_overall_byeditcount_Sept <- mobile_web_edit_counts %>%
filter(date >= "2019-09-01",
date <= '2019-09-30', #deployment date across all wikis
user_edit_count != 'undefined') %>%
group_by(user_edit_count) %>%
summarise(mobile_web_edits = sum(mobile_web_edits, na.rm = TRUE),
amc_edits = sum(amc_edits, na.rm = TRUE)) %>%
mutate(prop = amc_edits/mobile_web_edits *100)
amc_edits_prop_overall_byeditcount_Sept
We are also seeing the highest proportion of AMC mobile web edits completed by high volume editors and the lowest proportion by low volume editors. From August 7, 2019 (deployment date on all wikis) to the end of September 2019, 39.43% of all logged-in mobile web edits by high volume editors (500+) were made while in AMC.
##Plot of mobile web edits for enwiki.
p <- mobile_web_edit_counts_clean %>%
filter(edit_type == 'mobile_web_edits',
wiki == 'enwiki')%>%
mutate(date = floor_date(date, "month")) %>%
group_by(date, wiki)%>%
summarise(monthly_edits = sum(edit_count)) %>%
ggplot(aes(x=date, y = monthly_edits)) +
geom_line(size = 1.5, color = "blue") +
geom_vline(xintercept = as.numeric(as.Date(c("2019-08-07"))),
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-08-08'), y=4.6E5, label="AMC deployed on all Wikipedias"), size=3.7, vjust = -1.2, angle = 90, color = "black") +
scale_y_continuous("Number of edits per month", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "2 months", limits = c()) +
labs(title = "Mobile web edits on English Wikipedia") +
ggthemes::theme_tufte(base_size = 10, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position= "bottom",
legend.text=element_text(size = 12))
p
ggsave("Figures/mobile_web_edits_enwiki.png", p, width = 18, height = 9, units = "in", dpi = 150)
Since deployment through September 2019
#Overall Proportion of mobile web edits tagged with AMC on English Wikipedia since deployment
amc_edits_prop_enwiki <- mobile_web_edit_counts %>%
filter(date >= "2019-08-07", #deployment date across all wikis
wiki == 'enwiki',
user_edit_count != 'undefined') %>%
summarise(mobile_web_edits = sum(mobile_web_edits),
amc_edits = sum(amc_edits)) %>%
mutate(prop = amc_edits/mobile_web_edits *100)
amc_edits_prop_enwiki
Per month since deployment
amc_edits_prop_enwiki_permonth <- mobile_web_edit_counts %>%
filter(date >= "2019-08-01" ,#deployment date across all wikis
wiki == 'enwiki',
user_edit_count != 'undefined') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(mobile_web_edits = sum(mobile_web_edits, na.rm = TRUE),
amc_edits = sum(amc_edits, na.rm = TRUE)) %>%
mutate(prop = amc_edits/mobile_web_edits *100)
amc_edits_prop_enwiki_permonth
p <- mobile_web_edit_counts_clean %>%
filter(date >= '2019-08-07',
date <= '2019-09-28', #remove last week due to incomplete data
edit_type != 'mobile_web_edits',
wiki == 'enwiki',
user_edit_count != 'undefined') %>%
mutate(date = floor_date(date, "week")) %>%
filter(date != '2019-08-04') %>% #remove incomplete week
group_by(date, edit_type) %>%
summarise(total_mobile_edits = sum(edit_count, na.rm = TRUE)) %>%
ggplot(aes(x= date, y= total_mobile_edits, fill = edit_type)) +
geom_col() +
geom_vline(xintercept = as.Date('2019-08-07'),
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=6E3, label="AMC deployed on all Wikipedias"), size=4, vjust = -1.2, angle = 90, color = "black") +
scale_y_continuous("Number of edits per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "2 weeks") +
labs(title = "Mobile web edits on English Wikipedia by edit mode") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_edits_enwiki_amc_prop.png", p, width = 18, height = 9, units = "in", dpi = 150)
AMC edits account for about 16.2% of all mobile web edits made on English Wikipedia by logged-in users since deployment on August 8, 2019. The proportion of mobile web edits made while in AMC mode increased from about 4.7% the first month of deployment to 25.6% of mobile web edits from logged-in users in September.
# Plot of overall mobile web edit rate by user edit count
p <- mobile_web_edit_counts_clean %>%
filter(edit_type == 'mobile_web_edits',
wiki == 'enwiki',
user_edit_count != 'undefined') %>% ### remove undefined user edit counts
mutate(date = floor_date(date, "month")) %>%
group_by(date, user_edit_count) %>%
summarise(total_mobile_edits = sum(edit_count, na.rm = TRUE))%>%
ggplot(aes(x=date, y = total_mobile_edits, color = user_edit_count, linetype = user_edit_count)) +
geom_line(size = 1.5)+
scale_y_continuous("Number of edits per month", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "2 months", limits = c()) +
labs(title = "Mobile web edits by user edit count on English Wikipedia") +
geom_vline(xintercept = as.numeric(as.Date(c("2019-08-08"))),
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-08-08'), y=7E4, label="AMC deployed on all Wikipedias"), size=3.7, vjust = -1.2, angle = 90, color = "black") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="bottom")
ggsave("Figures/mobile_web_edits_enwiki_byeditcount.png", p , width = 18, height = 9, units = "in", dpi = 150)
p
Since deployment through September 2019
amc_edits_prop_enwiki_byeditcount <- mobile_web_edit_counts %>%
filter(date >= "2019-08-07", #deployment date across all wikis
wiki == 'enwiki',
user_edit_count != 'undefined') %>%
group_by(user_edit_count) %>%
summarise(mobile_web_edits = sum(mobile_web_edits),
amc_edits = sum(amc_edits)) %>%
mutate(prop = amc_edits/mobile_web_edits *100)
amc_edits_prop_enwiki_byeditcount
Similar to the trends seen across all Wikipedia projects, the highest proportion of AMC edits are made by high volume editors on English Wikipedia. From August 7, 2019 (deployment date on all wikis) to the end of September 2019, 24.9% of all logged-in mobile web edits by high volume editors (500+) were made while in AMC.
##Plot of mobile web edits by target wiki.
p <- mobile_web_edit_counts_clean %>%
filter(edit_type == 'mobile_web_edits',
wiki %in% c('eswiki', 'arwiki', 'idwiki', 'itwiki', 'jawiki', 'fawiki', 'thwiki'))%>%
mutate(date = floor_date(date, "month")) %>%
group_by(date, wiki)%>%
summarise(monthly_edits = sum(edit_count)) %>%
ggplot(aes(x=date, y = monthly_edits, color = wiki)) +
geom_line(size = 1.5) +
geom_vline(xintercept = vertical_lines,
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-03-20'), y=6E4, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.7, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-06-17'), y=6E4, label="AMC deployed on Italian, Japanese, Persian, and Thai Wikipedias"), size=3.7, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=6E4, label="AMC deployed on all Wikipedias"), size=3.7, vjust = -1.2, angle = 90, color = "black") +
scale_y_continuous("Number of edits per month", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "2 months", limits = c()) +
labs(title = "Monthly mobile web edits on target Wikipedia projects ") +
ggthemes::theme_tufte(base_size = 10, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position= "bottom",
legend.text=element_text(size = 12))
p
ggsave("Figures/mobile_web_edits_bytargetwiki.png", p, width = 18, height = 9, units = "in", dpi = 150)
##Calculate YOY change for target wikis
#Arwiki
mobile_web_edit_arwiki <- mobile_web_edit_counts_clean %>%
filter(wiki == 'arwiki',
edit_type == 'mobile_web_edits') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
#EsWiki
mobile_web_edit_eswiki <- mobile_web_edit_counts_clean %>%
filter(wiki == 'eswiki',
edit_type == 'mobile_web_edits') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
#idwiki
mobile_web_edit_idwiki <- mobile_web_edit_counts_clean %>%
filter(wiki == 'idwiki',
edit_type == 'mobile_web_edits') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
#itwiki
mobile_web_edit_itwiki <- mobile_web_edit_counts_clean %>%
filter(wiki == 'itwiki',
edit_type == 'mobile_web_edits') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
#jawiki
mobile_web_edit_jawiki <- mobile_web_edit_counts_clean %>%
filter(wiki == 'jawiki',
edit_type == 'mobile_web_edits') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
#fawiki
mobile_web_edit_fawiki <- mobile_web_edit_counts_clean %>%
filter(wiki == 'fawiki',
edit_type == 'mobile_web_edits') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
#thwiki
mobile_web_edit_thwiki <- mobile_web_edit_counts_clean %>%
filter(wiki == 'thwiki',
edit_type == 'mobile_web_edits') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_mobile_edits = sum(edit_count)) %>%
arrange(date) %>%
mutate(yearOverYear= (total_mobile_edits/lag(total_mobile_edits,12) -1)*100)
# Create YoY Table
wiki_list <- c('arwiki', 'eswiki', 'idwiki', 'itwiki', 'jawiki', 'fawiki', 'thwiki')
mobile_web_edit_yoy <- rbind(mobile_web_edit_arwiki[16,], mobile_web_edit_eswiki[16,],
mobile_web_edit_idwiki[16,], mobile_web_edit_itwiki[16,],
mobile_web_edit_jawiki[16,], mobile_web_edit_fawiki[16,], mobile_web_edit_thwiki[13,])
mobile_web_edit_yoy
Similar to overall trends, there was a steady increase in total mobile web edits the past 15 months for the all of the target Wikipedia projects.
The table below shows a comparison of year over year rates seen in June 2019 and September 2019 for all of the target wikis. In September 2019, there was a year over year increase for all target wikis ranging from 1.3% on Arabic Wikipedia to 70.5% on Indonesia Wikipedia.
As noted, these increases has been occuring prior to the deployment of AMC on target wikis and cannot be attributed to the deployment of AMC alone. Part of these changes are partly due to a sustained increase in overall active editors seen on these wikis.
Year over year changes in mobile web edit rates on target wikis
Wiki | June 2019 | September 2019 |
---|---|---|
Arabic Wiki | -7.4% | 1.3% |
Spanish Wiki | 16.7% | 1.4% |
Indonesian Wiki | 22.1% | 70.5% |
Italian Wiki | 27.1% | 33.6% |
Japanese Wiki | 37.9% | 48.4% |
Persian Wiki | 57.9% | 55.4% |
Thai Wiki | 23.3% | 29.7% |
Arabic, Spanish and Indonesian Target Wikis (Since March deployment through end of September 2019)
amc_edits_prop_target <- mobile_web_edit_counts %>%
filter(date >= "2019-03-20", #deployment date
wiki %in% c('arwiki', 'idwiki', 'eswiki'),
user_edit_count != 'undefined') %>%
group_by(wiki) %>%
summarise(mobile_web_edits = sum(mobile_web_edits),
amc_edits = sum(amc_edits)) %>%
mutate(prop = amc_edits/mobile_web_edits *100)
amc_edits_prop_target
p <- mobile_web_edit_counts_clean %>%
filter(date >= '2019-03-30',
date <= '2019-09-28', #remove last week due to incomplete data
edit_type != 'mobile_web_edits',
wiki %in% c('arwiki', 'idwiki', 'eswiki'),
user_edit_count != 'undefined') %>%
mutate(date = floor_date(date, "week")) %>%
group_by(date, wiki, edit_type) %>%
summarise(total_mobile_edits = sum(edit_count, na.rm = TRUE)) %>%
ggplot(aes(x= date, y= total_mobile_edits, fill = edit_type)) +
geom_col() +
facet_wrap(~ wiki, nrow = 3, scale = "free_y") +
scale_y_continuous("Number of edits per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "2 weeks") +
labs(title = "Mobile web edits on target wikipedias",
subtitle = "Where AMC deployed on March 20, 2019") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
amc_edits_prop_targetwiki_stacked_chart_March
ggsave("Figures/mobile_web_edits_March-target_amc_prop.png", p, width = 18, height = 9, units = "in", dpi = 150)
Italian, Japanese, Persian and Thai Target Wikis (Since June deployment through end of September 2019)
amc_edits_prop_target_July <- mobile_web_edit_counts %>%
filter(date >= "2019-06-17", #deployment date
wiki %in% c('itwiki', 'jawiki', 'fawiki', 'thwiki' ),
user_edit_count != 'undefined') %>%
group_by(wiki) %>%
summarise(mobile_web_edits = sum(mobile_web_edits),
amc_edits = sum(amc_edits)) %>%
mutate(prop = amc_edits/mobile_web_edits *100)
amc_edits_prop_target_July
p <- mobile_web_edit_counts_clean %>%
filter(date >= '2019-06-17',
date <= '2019-09-28', #remove last week due to incomplete data
edit_type != 'mobile_web_edits',
wiki %in% c('itwiki', 'jawiki', 'fawiki', 'thwiki'),
user_edit_count != 'undefined') %>%
mutate(date = floor_date(date, "week")) %>%
group_by(date, wiki, edit_type) %>%
summarise(total_mobile_edits = sum(edit_count, na.rm = TRUE)) %>%
ggplot(aes(x= date, y= total_mobile_edits, fill = edit_type)) +
geom_col() +
facet_wrap(~ wiki, nrow = 4, scale = "free_y") +
scale_y_continuous("Number of edits per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "2 weeks") +
labs(title = "Mobile web edits on target wikipedias",
subtitle = "Where AMC was deployed on June 17, 2019") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_edits_June_target_amc_prop.png", p, width = 18, height = 9, units = "in", dpi = 150)
On target wikis, the proportion of all mobile web edits made by logged-in users while in AMC mode has also increased since deployment but do not account for as high of proprotion of edits as seen on English Wikipedia.
Both Arabic (13.7%) and Persian Wikipedia (15.7%) have had the highest proportion of AMC tagged mobile web edits since deployment, while the proportion of AMC tagged mobile web edit on other target wikis range from 8.8% to 10%.
We calcululated retention rate by reviewing the number of users who selected to opt-in to AMC mode in their preferences and stayed opted-in during the retention period (from the deployment dates through October 30, 2019).
This was measured using opt-in/opt-out button done in T211197 and recorded in the PrepUpdate table.
##Collect retention rates on target wikis with breakdown by user edit counts
# In terminal
# spark2R --master yarn --executor-memory 2G --executor-cores 1 --driver-memory 4G
query <-
"with amc_optins as (
SELECT CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')) AS date,
wiki,
event.isdefault as amc_selection,
event.userid as userid
FROM event_sanitized.prefupdate
WHERE event.property = 'mf_amc_optin'
AND year = 2019 AND ((month >= 3 and day >=20) OR (month >= 4 and month <= 10))
),
edits as (
SELECT
event_user_id as userid,
wiki_db,
ARRAY_CONTAINS(event_user_groups_historical, 'bot') AS user_is_bot,
CASE
WHEN max(event_user_revision_count) is NULL THEN 'undefined'
WHEN max(event_user_revision_count) < 100 THEN 'under 100'
WHEN max(event_user_revision_count) >=100 AND max(event_user_revision_count) < 500 THEN '100-499'
ELSE '500+'
END AS user_edit_count
FROM wmf.mediawiki_history
WHERE snapshot = '2019-10'
Group by event_user_id, wiki_db, ARRAY_CONTAINS(event_user_groups_historical, 'bot')
)
SELECT date, wiki, amc_selection, user_edit_count, user_is_bot, COUNT(*) as n_opt
FROM amc_optins
LEFT JOIN edits
ON amc_optins.userid = edits.userid and
amc_optins.wiki = edits.wiki_db
GROUP BY date, wiki, amc_selection, user_edit_count, user_is_bot"
results <- collect(sql(query))
save(results, file="Readers-Web-AMC-metrics/Data/amc_retention_rates.RData")
load("Data/amc_retention_rates.RData")
amc_retention_rates <- results
amc_retention_rates$date <- as.Date(amc_retention_rates$date, format = "%Y-%m-%d")
#Revise amc_opt_out to factor and clarfiy TRUE and FALSE labels.
amc_retention_rates$amc_selection %<>% factor(c(TRUE, FALSE), c("amc_opt_out", "amc_opt_in"))
# Overall retention rate across all Wikipedias.
amc_retention_overall_percent <- amc_retention_rates %>%
filter(user_edit_count != 'undefined',
user_is_bot != 'TRUE', #remove any bots and unregistered users
date <= '2019-10-30') %>%
spread(amc_selection, n_opt) %>%
summarise(amc_opt_out = sum(amc_opt_out, na.rm = TRUE),
amc_opt_in = sum(amc_opt_in, na.rm = TRUE),
prop_opt_in_percent =(amc_opt_in)/(amc_opt_out+amc_opt_in)*100)
head(amc_retention_overall_percent)
There has been an 87.7% overall retention rate of the opt-in AMC mode across all Wiki projects, surpassing the target of 60%.
p <- amc_retention_rates %>%
filter(user_edit_count != 'undefined',
user_is_bot != 'TRUE', #remove any bots and unregistered users
date <= '2019-10-30') %>%
mutate(date = floor_date(date, "week")) %>%
group_by(date, amc_selection) %>%
summarise(n_opt = sum(n_opt)) %>%
ggplot(aes(x=date, y = n_opt, color = amc_selection)) +
geom_line(size = 1)+
scale_y_continuous("AMC selection count per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Retention rate of opt-in AMC on all Wikimedia Projects") +
geom_vline(xintercept = c(vertical_lines, as.Date('2019-07-15'),as.Date('2019-10-10')),
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-03-20'), y=2E3, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.6, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-06-17'), y=2E3, label="AMC deployed on Italian, Japanese, Persian, and Thai Wikipedias"), size=3.6, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-07-15'), y=2E3, label="AMC central notice banner deployed"), size=3.6, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=2E3, label="AMC deployed on all Wikipedias"), size=3.6, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-10-10'), y=2E3, label="AMC deployed on all Wikimedia Projects"), size=3.6, vjust = -1.2, angle = 90, color = "black") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="bottom")
ggsave("Figures/amc_retention_overall.png", p, width = 18, height = 9, units = "in", dpi = 150)
p
The time series chart above shows the increase in AMC retention rate following each deployment and campaign. There were significant increases in the number of AMC opt-ins following the central notice banner deployed on July 15, 2019, the deployment of AMC on all Wikipedias on August 7, 2019, and also following the deployment across all Wikipedia projects on October 10, 2019.
There is also a sudden spike that occurs between August 28, 2019 through August 31, 2019. Further investigation is needed to determine if this is due to change in user behavior or a data artificat.
# Overall retention rate for 100+ and 500+ editors
amc_retention_prop_overall_byeditor <- amc_retention_rates %>%
filter(user_edit_count != 'undefined',
user_is_bot != 'TRUE', #remove any bots and unregistered users
date <= '2019-10-30') %>%
group_by(user_edit_count, amc_selection) %>%
summarise(n_opt = sum(n_opt)) %>%
spread(amc_selection, n_opt) %>%
group_by(user_edit_count) %>%
mutate(prop_opt_in_perct = amc_opt_in/(amc_opt_out+amc_opt_in)*100)
amc_retention_prop_overall_byeditor
##Overall retention rate proportion editor type
p <- amc_retention_rates %>%
filter(
user_edit_count != 'undefined',
user_is_bot != 'TRUE', #remove any bots and unregistered users
date <= '2019-10-30') %>%
group_by(user_edit_count, amc_selection) %>%
summarise(n_opt = sum(n_opt))%>%
ungroup()%>%
ggplot(aes(x=factor(1), y= n_opt, fill = amc_selection)) +
geom_col(position="fill") +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~ user_edit_count, scale = "free_y") +
labs(title = "Retention rate of opt-in AMC by user edit count",
fill = "AMC Selection",
x = NULL,
y = "Proportion of AMC option selection") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_blank(),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="bottom")
ggsave("Figures/amc_retention_overall_byeditcount.png", p, width = 18, height = 9, units = "in", dpi = 150)
p
The retention rate of AMC was higher for medium to high volume editors compared to low volume editors. There was a 93.6% overall retention rate amongst users with 100+ edits and 86.4% rate limited to users with 500+ edits.
amc_retention_overall_percent_enwiki <- amc_retention_rates %>%
filter( wiki == 'enwiki',
user_edit_count != 'undefined',
user_is_bot != 'TRUE', #remove any bots and unregistered users
date <= '2019-10-30') %>%
spread(amc_selection, n_opt) %>%
summarise(amc_opt_out = sum(amc_opt_out, na.rm = TRUE),
amc_opt_in = sum(amc_opt_in, na.rm = TRUE),
prop_opt_in_percent =(amc_opt_in)/(amc_opt_out+amc_opt_in)*100)
amc_retention_overall_percent_enwiki
# Enwiki retention rate by user experience
amc_retention_prop_enwiki_byeditor <- amc_retention_rates %>%
filter(wiki == 'enwiki',
user_edit_count != 'undefined',
user_is_bot != 'TRUE', #remove any bots and unregistered users
date <= '2019-10-30') %>%
group_by(user_edit_count, amc_selection) %>%
summarise(n_opt = sum(n_opt)) %>%
spread(amc_selection, n_opt) %>%
group_by(user_edit_count) %>%
mutate(prop_opt_in_perct = amc_opt_in/(amc_opt_out+amc_opt_in)*100)
amc_retention_prop_enwiki_byeditor
On English Wikipedia, 89% of all registered users who selected to opt-in to AMC mode since deployment on August 7, 2019 through the end of October 30, 2019 stayed opted-in.
Similar to trends seen across all Wikimedia projects, the highest retention rate (94.9%) is seen for users with cumulative edit counts between 100 to 499. Both high-volume (500+ cumulative edits) and low-volume (under 100 cumulative edits) users groups had similar retention rates.
# Overall retention rate
amc_retention_targetwiki_percent <- amc_retention_rates %>%
filter(wiki %in% c('eswiki', 'arwiki', 'idwiki', 'itwiki', 'jawiki', 'fawiki', 'thwiki'),
user_edit_count != 'undefined',
user_is_bot != 'TRUE', #remove any bots and unregistered users
date <= '2019-10-30') %>%
spread(amc_selection, n_opt)%>%
group_by(wiki) %>%
summarise(amc_opt_out = sum(amc_opt_out, na.rm = TRUE),
amc_opt_in = sum(amc_opt_in, na.rm = TRUE),
prop_opt_in_percent =(amc_opt_in)/(amc_opt_out+amc_opt_in)*100)
amc_retention_targetwiki_percent
##Plot overall proportion of AMC retention rates on each target wiki
p <- amc_retention_rates %>%
filter(wiki %in% c('eswiki', 'arwiki', 'idwiki', 'itwiki', 'jawiki', 'fawiki', 'thwiki'),
user_edit_count != 'undefined',
user_is_bot != 'TRUE', #remove any bots and unregistered users
date <= '2019-10-30') %>%
group_by(wiki, amc_selection) %>%
summarise(n_opt = sum(n_opt)) %>%
ggplot(aes(x=factor(1), y= n_opt, fill = amc_selection)) +
geom_col(position="fill") +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~wiki, scale = "free_y") +
labs(title = "Retention rate of opt-in AMC on all target Wikipedias",
fill = "AMC Selection",
x= NULL,
y = "Proportion of AMC option selection") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_blank(),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="bottom")
ggsave("Figures/amc_retention_target_wikis.png", p, width = 18, height = 9, units = "in", dpi = 150)
p
On target wikis, opt-in retention rates were slightly below the retention rate for English Wikipedia but still were above the target of 60%. Retention rates ranged from 78.8% on Thai Wikipedia to 84.4% on Spanish Wikipedia.
# Retention rates for Target wikis where AMC was deployed in March
p <- amc_retention_rates %>%
filter(wiki %in% c('eswiki', 'arwiki', 'idwiki'),
user_edit_count != 'undefined',
user_is_bot != 'TRUE', #remove any bots and unregistered users
date >= '2019-03-20',
date <= '2019-10-30') %>%
mutate(date = floor_date(date, "week")) %>%
group_by(date, wiki, amc_selection) %>%
summarise(n_opt = sum(n_opt)) %>%
ggplot(aes(x=date, y = n_opt, color = amc_selection)) +
geom_line(size = 1)+
facet_wrap(~ wiki, nrow = 4, scale = "free_y") +
scale_y_continuous("AMC selection count per week", labels = polloi::compress) +
scale_x_date(labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Retention rate of opt-in AMC on target Wikipedias",
subtitle = "Where AMC was deployed on March 20, 2019") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
axis.title.x=element_blank(),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="bottom")
ggsave("Figures/amc_retention_targetwiki_March_weekly.png", p, width = 18, height = 9, units = "in", dpi = 150)
p
# Retention rates for Target wikis where AMC was deployed in June
p <- amc_retention_rates %>%
filter(wiki %in% c('itwiki', 'jawiki', 'fawiki', 'thwiki'),
user_edit_count != 'undefined',
user_is_bot != 'TRUE', #remove any bots and unregistered users
date >= '2019-06-17',
date <= '2019-10-30') %>%
mutate(date = floor_date(date, "week")) %>%
group_by(date, wiki, amc_selection) %>%
summarise(n_opt = sum(n_opt)) %>%
ggplot(aes(x=date, y = n_opt, color = amc_selection)) +
geom_line(size = 1)+
facet_wrap(~ wiki, nrow = 4, scale = "free_y") +
scale_y_continuous("AMC selection count per week", labels = polloi::compress) +
scale_x_date(labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Retention rate of opt-in AMC on target Wikipedias",
subtitle = "Where AMC was deployed on June 17, 2019") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
axis.title.x=element_blank(),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="bottom")
ggsave("Figures/amc_retention_targetwiki_June_weekly.png", p, width = 18, height = 9, units = "in", dpi = 150)
p
On the target wikis, retention rates remained fairly consistent from deployment of AMC through the end of October 2019 with weekly retention rates mostly staying abover 60%.
Thai Wikipedia's AMC retention rate dipped to around 58% during the weeks of 2019-07-21 and 2019-08-04 but increased to 78.8% by the end of October 2019.
We calculated changes in the overall rate of moderation actions on mobile web. Moderation actions are defined in T213461.
Since moderation actions recorded in the log table were not tagged with both the mobile web edit and amc tag until March 18, 2019 (https://phabricator.wikimedia.org/T215477#5008003), we could not calculate a yoy increase for these actions but reviewed changes since March 2019 through September 2019. As a result, seasonal fluctuations may elevate some of the identified changes.
Data was retrieved from the revision_tag field availble in mediawiki_history and the logs table. All actions that were completed on mobile web were tagged as mobile web edit and advanced mobile edit if applicable.
# Overall rate for logging table actions on mobile web (not limited to AMC tags)
# March 2019-October 2019
# In terminal
# spark2R --master yarn --executor-memory 2G --executor-cores 1 --driver-memory 4G
query <- "SELECT
SUBSTR(log_timestamp, 0, 8) AS date,
logging.wiki_db as wiki,
SUM(If(logging.log_type = 'block' and logging.log_action = 'block', 1, 0)) as block,
SUM(If(logging.log_type = 'block' and logging.log_action = 'unblock', 1, 0)) as unblock,
SUM(If(logging.log_type = 'delete' and logging.log_action = 'delete', 1, 0)) as delete,
SUM(If(logging.log_type = 'protect' and logging.log_action = 'protect', 1, 0)) as protect,
SUM(If(logging.log_type = 'move' and logging.log_action = 'move', 1, 0)) as move,
SUM(If(logging.log_type = 'thanks' and logging.log_action = 'thank', 1, 0)) as thank,
SUM(If(logging.log_type = 'review' and logging.log_action = 'approve', 1, 0)) as approve
FROM wmf_raw.mediawiki_logging as logging
INNER JOIN (
SELECT change_tag.ct_rev_id as rev_id,
change_tag_def.ctd_name as tag_name,
change_tag.ct_log_id as log_id
FROM wmf_raw.mediawiki_change_tag as change_tag
INNER JOIN wmf_raw.mediawiki_change_tag_def as change_tag_def ON
change_tag.ct_tag_id = change_tag_def.ctd_id
WHERE change_tag.snapshot = '2019-10' and
change_tag_def.snapshot = '2019-10'
) as ct
ON logging.log_id = ct.log_id
WHERE
logging.snapshot = '2019-10' and
logging.log_timestamp >= '20190301' and
logging.log_timestamp < '20191001' and
(ct.tag_name like '%mobile web edit%' or ct.tag_name like '%advanced mobile edit%')
GROUP BY SUBSTR(log_timestamp, 0, 8), logging.wiki_db"
results <- collect(sql(query))
save(results, file="Readers-Web-AMC-metrics/Data/moderation_counts_log.RData")
load("Data/moderation_counts_log.RData")
moderation_counts_log <- results
moderation_counts_log$date <- as.Date(moderation_counts_log$date, format = "%Y%m%d")
moderation_counts_log$date<- format(moderation_counts_log$date,"%Y-%m-%d")
moderation_counts_log$date <- as.Date(moderation_counts_log$date, format = "%Y-%m-%d")
# Query to collect unblock and rollback from the revision tag table table.
# Done using change tags now available in mediawiki_history
# In terminal
# spark2R --master yarn --executor-memory 2G --executor-cores 1 --driver-memory 4G
query <- "select
date_format(event_timestamp, 'yyyy-MM-dd') as date,
wiki_db as wiki,
sum(cast(mw_rollback as int)) as rollback,
sum(cast(mw_undo as int)) as undo
from (
select
wiki_db,
event_timestamp,
array_contains(revision_tags, 'mw-rollback') as mw_rollback,
array_contains(revision_tags, 'mw-undo') as mw_undo
from wmf.mediawiki_history
where
event_timestamp IS NOT NULL and
(array_contains(revision_tags, 'mobile web edit') or
array_contains(revision_tags, 'advanced mobile edit')) and
event_timestamp >= '2019-03-01' and event_timestamp < '2019-10-01' and
snapshot = '2019-10'
) edits
group by wiki_db, date_format(event_timestamp, 'yyyy-MM-dd')"
results <- collect(sql(query))
save(results, file="Readers-Web-AMC-metrics/Data/moderation_counts_ct.RData")
load("Data/moderation_counts_ct.RData")
moderation_counts_ct <- results
moderation_counts_ct$date <- as.Date(moderation_counts_ct$date, format = "%Y-%m-%d")
##Join the two moderation count tables
moderation_counts_all <- inner_join(moderation_counts_log, moderation_counts_ct,
by = c("date", "wiki"))
p <- moderation_counts_all %>%
gather(action_type, action_count, block:undo) %>%
filter(date >= '2019-03-18') %>% #date where log actions were tagged
mutate(date = floor_date(date, "month")) %>%
group_by(date, action_type) %>%
summarise(action_count = sum(action_count)) %>%
group_by(action_type) %>%
summarise(monthly_avg_count = round(mean(action_count),0)) %>%
ggplot(aes(x=action_type, y= monthly_avg_count, fill = action_type)) +
geom_bar(stat='identity') +
geom_text(aes(label=monthly_avg_count), vjust=0) +
labs(title = "Average monthly moderation actions by type \n March 2019-October 2019") +
ylab("Average number of moderation actions per month")+
xlab("type") +
ggthemes::theme_tufte(base_size = 11, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position = "none")
p
ggsave("Figures/moderation_counts_bytype_overall_avg.png",p, width = 18, height = 9, units = "in", dpi = 150)
#Plot overall moderation actions by action type
p <- moderation_counts_all %>%
gather(action_type, action_count, block:undo) %>%
filter(date >= '2019-03-18') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date, action_type) %>%
summarise(action_count = sum(action_count)) %>%
ggplot(aes(x=date, y = action_count, color = action_type)) +
geom_line(size = 1)+
scale_y_continuous("Number of moderation actions per month", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Overall rate of mobile web moderation actions by type") +
geom_vline(xintercept = vertical_lines,
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-03-20'), y=2E6, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.5, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-06-17'), y=2E6, label="AMC deployed on Italian, Japanese, Persian, and Thai Wikipedias"), size=3.5, vjust = -1.2, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=2E6, label="AMC deployed on all Wikipedias"), size=3.5, vjust = -1.2, angle = 90, color = "black") +
ggthemes::theme_tufte(base_size = 11, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="right")
moderation_counts_bytype_monthly
ggsave("Figures/moderation_counts_bytype_monthly_overall.png", p , width = 18, height = 9, units = "in", dpi = 150)
There is an overall increase in all types of moderation actions on mobile web following the AMC deployment. The thank action is the most commonly used moderation action on mobile web. However, the number of blocks and approves has both seen signficant increases since AMC was deployed on all wikis.
moderation_counts_all_overall <- moderation_counts_all %>%
filter(date >= '2019-03-18') %>% #date where log actions were tagged
gather("action_type", "action_count", 3:11 ) %>%
group_by(date) %>%
summarise(total_count = sum(action_count)) %>%
arrange(date)
p <- moderation_counts_all_overall %>%
mutate(rolling_average = rollmean(as.numeric(total_count), 7, na.pad=TRUE, align="right")) %>%
ggplot(aes(x=date, y = rolling_average)) +
geom_line(size = 1, color = 'blue')+
scale_y_continuous("Number of moderation actions (7-day rolling average)", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Rate of mobile web moderation actions on all Wikimedia Projects") +
geom_vline(xintercept = vertical_lines,
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-03-20'), y=4E5, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.6, vjust = 1.5, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-06-17'), y=4E5, label="AMC deployed on Italian, Japanese, Persian, and Thai Wikipedias"), size=3.6, vjust = 1.5, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=4E5, label="AMC deployed on all Wikipedias"), size=3.6, vjust = 1.5, angle = 90, color = "black") +
ggthemes::theme_tufte(base_size = 11, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="right")
p
ggsave("Figures/moderation_counts_daily_overall.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Calculate MoM (month over month) overall increase in moderation actions.
moderation_action_monthly_overall_mom <- moderation_counts_all_overall %>%
mutate(date = floor_date(date, "month"))%>%
filter(date != '2019-03-01') %>% #data incomplete for March
group_by(date) %>%
summarise(total_action_count = sum(total_count)) %>%
arrange(date) %>%
mutate(monthOvermonth= (total_action_count/lag(total_action_count,1) -1)*100)
moderation_action_monthly_overall_mom
There was a signficant increase in the number of moderation actions across all Wikimedia projects following the deployment of AMC to all Wikipedias in August 2019. From Q4 (April to June 2019) to Q1 (July - September 2019), the total number of moderation actions on mobile web increased by 47%. The month over month increase from August to September 2019 was 31% across all wikis.
Data was unable to compare year over year changes so these increases may be elevated due to seasonal fluctuations; however, there is a greater rate of increase immediately following the date of deployment compared to previous months and the percent change is higher than we see in overall moderation rates across all types of platforms. For example, in September 2019, there was an 8% month over month increase across all platforms. Once a desktop revision tag is created, it would be valuable to compare moderation action on mobile web compare to rates seen on desktop.
#Time series of action type on Wikis
p <- moderation_counts_all %>%
filter(wiki == 'enwiki',
date >= '2019-03-18') %>% #date where log actions were tagged)
gather(action_type, action_count, block:undo) %>%
arrange(desc(date)) %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date, action_type) %>%
summarise(action_count = sum(action_count)) %>%
ggplot(aes(x=date, y = action_count, color = action_type)) +
geom_line(size = 1)+
scale_y_continuous("Number of moderation actions per month", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Rate of mobile web moderation actions \n on English Wikipedia by type") +
geom_vline(xintercept = as.Date('2019-08-07'),
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=300E3, label="AMC deployed on all Wikipedias"), size=3.6, vjust = -0.5, angle = 90, color = "black") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="right")
p
ggsave("Figures/moderation_counts_bytype_enwiki_monthly.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Overall trend on English Wikipedia
moderation_counts_enwiki_monthly <- moderation_counts_all %>%
filter(wiki == 'enwiki',
date > '2019-03-20') %>% #after log actions were tagged) %>%
gather(action_type, action_count, block:undo) %>%
group_by(date) %>%
summarise(action_count = sum(action_count))
p <- moderation_counts_enwiki_monthly %>%
#mutate(action_count_avg = rollmean(action_count, 7, na.pad=TRUE, align="right")) %>%
ggplot(aes(x=date, y = action_count)) +
geom_line(size = 1.5, color = 'blue')+
scale_y_continuous("Number of moderation actions per day", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Rate of mobile web moderation actions on English Wikipedia") +
geom_vline(xintercept = as.Date('2019-08-07'),
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=100E3, label="AMC deployed on all Wikipedias"), size=3.6, vjust = -0.5, angle = 90, color = "black") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="right")
p
ggsave("Figures/moderation_counts_enwiki_daily.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Calculate MoM (month over month) change in moderation actions for enwiki.
moderation_action_monthly_enwiki_mom <- moderation_counts_all %>%
filter(wiki == 'enwiki',
date >= '2019-04-01') %>% #remove first month as we did not have complete data
gather(action_type, action_count, block:undo) %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_action_count = sum(action_count)) %>%
arrange(date) %>%
mutate(monthOvermoth = (total_action_count - lag(total_action_count))/lag(total_action_count) *100)
moderation_action_monthly_enwiki_mom
Moderation actions on English Wikipedia have steadily increased with a signficant increase following AMC deployment across all wikis on August 7th, 2109. Comparing the month before deployment (July 2019) to the month after (August 2019), the number of moderation actions on mobile web increased by 46.9% from July to September 2019. There was a 37.6% month over month increase from August to September 2019.
#Action type on Wikis
p <- moderation_counts_all %>%
filter(wiki %in% c('eswiki', 'arwiki', 'idwiki', 'itwiki', 'jawiki', 'fawiki', 'thwiki'),
date >= '2019-03-18') %>% #date where log actions were tagged)
gather(action_type, action_count, block:undo) %>%
arrange(desc(date)) %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date, action_type) %>%
summarise(action_count = sum(action_count)) %>%
ggplot(aes(x=date, y = action_count, color = action_type)) +
geom_line(size = 1)+
scale_y_continuous("Number of moderation actions per month", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Rate of mobile web moderation actions count \n on target wikis by type") +
geom_vline(xintercept = vertical_lines,
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-03-20'), y=200E3, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.5, vjust = -0.5, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=200E3, label="AMC deployed on all Wikipedias"), size=3.5, vjust = -0.5, angle = 90, color = "black") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="right")
p
ggsave("Figures/moderation_counts_bytype_targetwikis_monthly.png", p, width = 18, height = 9, units = "in", dpi = 150)
On target wikis, the thank action is also the most commonly used moderation action on mobile web compared to the other actions. There has been a general increase in the use of moderation actions on almost all the target wikis with few declines on the smaller ones.
moderation_counts_targetwiki <- moderation_counts_all %>%
filter(wiki %in% c('eswiki', 'arwiki', 'idwiki', 'itwiki', 'jawiki', 'fawiki', 'thwiki'),
date >= '2019-03-18') %>% #date where log actions were tagged)
gather(action_type, action_count, block:undo) %>%
group_by(date, wiki)%>%
summarise(action_count = sum(action_count))
head(moderation_counts_targetwiki)
#Plot time series of moderation actions on target wikis
p <- moderation_counts_targetwiki %>%
#mutate(action_count_avg = rollmean(action_count, 7, na.pad=TRUE, align="right")) %>%
ggplot(aes(x=date, y = action_count)) +
geom_line(size = 1.5, color = 'blue')+
facet_wrap(~ wiki, ncol = 3,nrow = 6, scale = "free_y") +
scale_y_continuous("Number of moderation actions per day", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Rate of mobile web moderation actions on all target wikis") +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
panel.grid = element_line("gray70"),
legend.position="right")
p
ggsave("Figures/moderation_counts_daily_targetwikis.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Calculate MoM (month over month) change in moderation actions for all target wikis.
moderation_action_monthly_target_mom <- moderation_counts_all %>%
filter(wiki %in% c('eswiki', 'arwiki', 'idwiki', 'itwiki', 'jawiki', 'fawiki', 'thwiki')) %>%
gather(action_type, action_count, block:undo) %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(total_action_count = sum(action_count)) %>%
arrange(date) %>%
mutate(monthOvermoth = (total_action_count - lag(total_action_count))/lag(total_action_count) *100)
moderation_action_monthly_target_mom
Wiki | Q4 over Q1 Change |
---|---|
Spanish Wiki | 89.7% |
Arabic Wiki | 22.9% |
Indonesian Wiki | -72.8% |
Italian Wiki | 10.8% |
Japanese Wiki | -8.7% |
Persian Wiki | 36.0% |
Thai Wiki | -19.6% |
On the target wikis, there has been much more fluctation in the number of moderation actions each month since AMC deployment. Comparing Q4 (April-June 2019) to Q1 (July-September), the number of moderation actions increasd for Spanish Wikipedia, Arabic, Persian and Italian Wikipedias while the number of moderation actions decreased for Indonesian, Japanese and Thai Wikipedias.
These fluctuations in the number of moderation actions may be due to seasonal changes and also due to the smaller size of some of these wikis.
We calculated the rate of access (i.e. pageviews) to special pages from AMC and mobile web edits overall by using the X-Analytics tag (a general purpose header for measurement purposes) for AMC done in T212961.
The X-Analytics tag for AMC is recorded in the mf-m key. It is recorded as 'b%2Camc' if the user is opted into both beta and amc mode and as 'amc' for users opted into just amc. Please see a full list of X-Analytic key definitions.
The analysis was based on a 1/64 sample of data collected from the webrequest data from August 2019 through October 30, 2019.
Note: During analysis, I found that only Search and Recent Changes special pages were being recorded as pageviews in the pageview_hourly table starting around July 22nd to July 23rd, leading to an inaccurate drop in views on special pages due to a bug. Further investigation is needed. In the meantime, I reviewed all requests to special pages from the webrequest data (not isolated to pageviews) to complete the analysis.
#special page requests from logged in users on mobile web including X-analytics tag for AMC views
#from wmf.webrequest
query <-
"SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) AS date,
CONCAT(normalized_host.project, '.',normalized_host.project_family) AS wiki,
SUM(IF ((x_analytics_map['mf-m'] = 'b%2Camc' OR
x_analytics_map['mf-m'] = 'amc'), 1, 0)) as amc_request,
COUNT(*) AS all_mobile_web_requests
FROM wmf.webrequest TABLESAMPLE(BUCKET 1 OUT OF 64 ON hostname, sequence)
WHERE year = 2019 AND month >= 08
-- look at special pages
AND namespace_id = -1
AND normalized_host.project_family = 'wikipedia'
AND x_analytics_map['loggedIn'] IS NOT NULL
AND agent_type = 'user'
AND access_method = 'mobile web'
AND webrequest_source = 'text'
GROUP BY CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')), CONCAT(normalized_host.project, '.',normalized_host.project_family)"
special_pages_requests <- wmf::query_hive(query)
special_pages_requests$date <- as.Date(special_pages_requests$date, format = "%Y-%m-%d")
special_pages_requests_clean <- special_pages_requests %>%
mutate(other_mobile_web_request = all_mobile_web_requests - amc_request) %>%
gather(request_type, request_count, 3:5) %>%
arrange(desc(date))
special_pages_requests_clean$request_type %<>% factor(levels= c("amc_request","all_mobile_web_requests", "other_mobile_web_request"))
# Which special pages are being viewed from amc mode on all Wikipedia projects?
query <-
"SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) AS date,
CONCAT(normalized_host.project, '.',normalized_host.project_family) AS wiki,
x_analytics_map['special'] AS special_page_name,
COUNT(*) AS requests
FROM wmf.webrequest TABLESAMPLE(BUCKET 1 OUT OF 64 ON hostname, sequence)
WHERE year = 2019 AND month >= 08
AND (x_analytics_map['mf-m'] = 'b%2Camc' OR
x_analytics_map['mf-m'] = 'amc')
AND namespace_id = -1
AND normalized_host.project_family = 'wikipedia'
AND x_analytics_map['loggedIn'] IS NOT NULL
AND agent_type = 'user'
AND access_method = 'mobile web'
AND webrequest_source = 'text'
GROUP BY CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')), x_analytics_map['special'],
CONCAT(normalized_host.project, '.',normalized_host.project_family)"
special_page_requests_bypage <- wmf::query_hive(query)
special_page_requests_bypage$date <- as.Date(special_page_requests_bypage$date, format = "%Y-%m-%d")
##all mobile web requests to special pages
special_pages_requests_overall = special_pages_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(request_type == "all_mobile_web_requests",
date != '2019-08-25',
date != '2019-11-24' )%>% #remove weeks with incomplete data
group_by(date) %>%
summarize(request_count = sum(request_count)) %>%
arrange(desc(date))
special_pages_requests_bytype <- special_pages_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(request_type != "all_mobile_web_requests",
date != '2019-08-25',
date != '2019-11-24') %>%
group_by(date, request_type) %>%
summarise(request_count = sum(request_count))
head(special_pages_requests_byrequest)
#Plot mobile web requests to special pages overall
p <- ggplot() +
geom_bar(data =special_pages_requests_bytype, aes(x= date, y= request_count, fill = request_type),stat= "identity", position = "stack") +
geom_smooth(data = special_pages_requests_overall, aes(x=date, y=request_count), se = FALSE) +
scale_y_continuous("Number of requests per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
labs(title = "Mobile web requests to special pages on all Wikipedia projects",
subtitle = "Based on a sample of data collected from webrequest data") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_special_page_requests_overall.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Calculate monthly proportion of amc_edits
special_pages_requests_monthly_prop <- special_pages_requests %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(amc_requests_total = sum(amc_request),
all_mobile_web_requests_total = sum(all_mobile_web_requests),
amc_prop = amc_requests_total/all_mobile_web_requests_total *100)
special_pages_requests_monthly_prop
#Calculate month over month changes
special_pages_requests_mom <- special_pages_requests_clean %>%
mutate(date = floor_date(date, "month")) %>%
filter(request_type == "all_mobile_web_requests",
date != '2019-11-01',
date != '2019-08-01') %>% #October is the only complete month to compate mom changes
group_by(date) %>%
summarise(request_count = sum(request_count)) %>%
arrange(date) %>%
mutate(monthOvermonth= (request_count/lag(request_count,1) -1)*100)
tail(special_pages_requests_mom)
Requests to special pages has remained relatively flat (there was only a 3.3% month over month change between September 2019 and October 2019) and views from mobile web in AMC mode represent about only 2% of all mobile web views to these pages. However, there has been an increase in the proportion of mobile web requests to special pages coming from AMC mode since it was deployed on all Wikipedias on August 7, 2018. The proportion of AMC requests increased from September to October 2019 by 15.9%.
Top 10 Special pages viewed in AMC Mode
special_page_requests_bypage_top10 <- special_page_requests_bypage %>%
group_by(special_page_name) %>%
summarise(requests = sum(requests)) %>%
mutate(percent_specialpage_requests = requests/sum(requests)*100) %>%
top_n(10) %>%
arrange(desc(percent_specialpage_requests))
special_page_views_fromamc_top10
The Mobile Diff, Watchlist, Contributions, and Search are the most viewed special pages from users in AMC mode on all Wikipedias.
special_pages_requests_enwiki = special_pages_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(wiki == 'en.wikipedia',
request_type == "all_mobile_web_requests",
date != '2019-08-25',
date != '2019-11-24' )%>% #remove weeks with incomplete data
group_by(date) %>%
summarize(request_count = sum(request_count))
special_pages_requests_bytype_enwiki <- special_pages_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(wiki == 'en.wikipedia',
request_type != "all_mobile_web_requests",
date != '2019-08-25',
date != '2019-11-24' ) %>% #remove weeks with incomplete data
group_by(date, request_type) %>%
summarise(request_count = sum(request_count))
p <- ggplot() +
geom_bar(data =special_pages_requests_bytype_enwiki, aes(x= date, y= request_count, fill = request_type),stat= "identity", position = "stack") +
geom_smooth(data = special_pages_requests_enwiki, aes(x=date, y=request_count), se = FALSE) +
scale_y_continuous("Number of requests per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
labs(title = "Mobile web requests to special pages on English Wikipedia",
subtitle = "Based on a sample of data collected from webrequest data") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_special_page_requests_enwiki.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Calculate monthly proportion of amc_edits
special_pages_requests_monthly_prop_enwiki <- special_pages_requests %>%
filter(wiki == 'en.wikipedia') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(amc_requests_total = sum(amc_request),
all_mobile_web_requests_total = sum(all_mobile_web_requests),
amc_prop = amc_requests_total/all_mobile_web_requests_total *100)
special_pages_requests_monthly_prop_enwiki
#Calculate month over month changes in special requests on English Wiki
special_pages_requests_enwiki_mom <- special_pages_requests_clean %>%
filter(wiki == 'en.wikipedia',
request_type == "all_mobile_web_requests",
date >= '2019-09-01',
date <= '2019-10-31') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(request_count = sum(request_count)) %>%
arrange(date) %>%
mutate(monthOvermonth= (request_count/lag(request_count,1) -1) *100)
tail(special_pages_requests_enwiki_mom)
Similar to overall trends, the overall rate of access to special pages from mobile web has remained pretty flat; however, the proportion of these views coming from AMC mode has increased.
On English Wikipedia, requests from AMC mode to special pages represent 1.7% of all logged-in mobile web views in October 2019. This is slightly lower compared to the rate across all Wikipedias (2.0%) but is a 15.3% month over month increase from September 2019.
Top 10 Special pages viewed in AMC Mode on English Wikipedia
special_page_requests_bypage_top10_enwiki <- special_page_requests_bypage %>%
filter(wiki == 'en.wikipedia') %>%
group_by(special_page_name) %>%
summarise(requests = sum(requests)) %>%
mutate(percent_specialpage_requests = requests/sum(requests)*100) %>%
top_n(10) %>%
arrange(desc(percent_specialpage_requests))
special_page_requests_bypage_top10_enwiki
The Mobile Diff, Watchlist, Contributions, and Search are also the most viewed special pages from users in AMC mode on English Wikipedia. There's a slightly lower proportion of mobile web requests to the Contributions page from AMC on English Wikipedia compared to all Wikipedias (8.7% on English Wikipedia vs 10.2% overall).
special_pages_requests_targetwiki_March = special_pages_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(wiki %in% c('ar.wikipedia', 'es.wikipedia', 'id.wikipedia'),
request_type == "all_mobile_web_requests",
date != '2019-08-25',
date != '2019-11-24' ) %>% #remove weeks with incomplete data
group_by(date, wiki) %>%
summarize(request_count = sum(request_count))
special_pages_requests_bytype_targetwiki_March <- special_pages_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(wiki %in% c('ar.wikipedia', 'es.wikipedia', 'id.wikipedia'),
request_type != "all_mobile_web_requests",
date != '2019-08-25',
date != '2019-11-24' ) %>% #remove weeks with incomplete data
group_by(date, wiki, request_type) %>%
summarise(request_count = sum(request_count))
p <- ggplot() +
geom_bar(data =special_pages_requests_bytype_targetwiki_March, aes(x= date, y= request_count, fill = request_type),stat= "identity", position = "stack") +
geom_smooth(data = special_pages_requests_targetwiki_March, aes(x=date, y=request_count), se = FALSE) +
facet_wrap(~ wiki, nrow = 7, scale = "free_y") +
scale_y_continuous("Number of requests per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
labs(title = "Mobile web requests to special pages on target wikis",
subtitle = "Based on a sample of data collected from webrequest data") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_special_page_requests_March_target.png", p, width = 18, height = 9, units = "in", dpi = 150)
special_pages_requests_targetwiki_June = special_pages_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(wiki %in% c('it.wikipedia', 'ja.wikipedia',
'fa.wikipedia', 'th.wikipedia'),
request_type == "all_mobile_web_requests",
date != '2019-08-25',
date != '2019-11-24' ) %>% #remove weeks with incomplete data
group_by(date, wiki) %>%
summarize(request_count = sum(request_count))
special_pages_requests_bytype_targetwiki_June <- special_pages_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
mutate(date = floor_date(date, "week")) %>%
filter(wiki %in% c('it.wikipedia', 'ja.wikipedia',
'fa.wikipedia', 'th.wikipedia'),
request_type != "all_mobile_web_requests",
date != '2019-08-25',
date != '2019-11-24' ) %>% #remove weeks with incomplete data
group_by(date, wiki, request_type) %>%
summarise(request_count = sum(request_count))
p <- ggplot() +
geom_bar(data =special_pages_requests_bytype_targetwiki_June, aes(x= date, y= request_count, fill = request_type),stat= "identity", position = "stack") +
geom_smooth(data = special_pages_requests_targetwiki_June, aes(x=date, y=request_count), se = FALSE) +
facet_wrap(~ wiki, nrow = 7, scale = "free_y") +
scale_y_continuous("view count", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
labs(title = "Mobile web requests to special pages on target wikis",
subtitle = "Based on a 1/64 sample of data collected from webrequest data") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_special_page_requests_June_target.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Calculate monthly proportion of amc_edits
special_pages_requests_enwiki_mom <- special_pages_requests_clean %>%
filter(wiki == 'it.wikipedia',
request_type == "all_mobile_web_requests",
date >= '2019-09-01',
date <= '2019-10-31') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(request_count = sum(request_count)) %>%
arrange(date) %>%
mutate(monthOvermonth= (request_count/lag(request_count,1) -1)*100)
From September 2019 to October 2019, there was an increase in the proportion of special page views from users while in AMC mode on Arabic, Spanish, Indonesian, and Italian Wikipedias. Conversely, there was a decline for Japanese, Persian and Thai Wikpedias.
See table below for a breakdown of the % increase in requests to special pages from AMC mode from Sepetember to October 2019 for each target wiki.
Summary of access requests to Special Pages on Target Wikis - October 2019
Wiki | Proportion of AMC Edits | Sept 2019 to Oct 2019 change in AMC Proportion | Sept 2019 to Oct 2019 change in requests |
---|---|---|---|
Arabic Wiki | 3.7% | 21.3% | 2.8% |
Spanish Wiki | 2.9% | 32.8% | 7.0% |
Indonesian Wiki | 1.6% | 11.1% | 13.5% |
Italian Wiki | 2.8% | 92.0% | -1.2% |
Japanese Wiki | 1.9% | -15.5% | 0.0% |
Persian Wiki | 2.3% | -3.6% | 2.2% |
Thai Wiki | 1.9% | -7.8% | -0.2% |
Top 10 Special pages viewed in AMC Mode on Target Wikipedias
special_page_requests_bypage_top10_targetwiki <- special_page_requests_bypage %>%
filter(wiki %in% c('ar.wikipedia', 'es.wikipedia', 'id.wikipedia', 'it.wikipedia', 'ja.wikipedia',
'fa.wikipedia', 'th.wikipedia')) %>%
group_by(special_page_name) %>%
summarise(requests = sum(requests)) %>%
mutate(percent_specialpage_requests = requests/sum(requests)*100) %>%
top_n(10) %>%
arrange(desc(percent_specialpage_requests))
special_page_requests_bypage_top10_targetwiki
Compared to English Wikipedia, there are a higher proportion of requests to the Contributions page from target wikis (11.5% vs 8.7%) and a lower proprotion of request to the Watchlist page (20.8% vs. 28.1%). Mobile Diff remains the top viewed special page acros all reviewed Wikipedias.
The same methodology used to obtain the rate of access to special pages was used to obtain the rate of access to talk pages except we reviewed all requests to article talk pages where the page namespace id is equal to 1.
The proportion of AMC requests to special pages was based on a 1/64 sample of data collected the webrequest data from August 2019 through most recent data available in November 2019. Data for the overall change in talk page views was based on pageview_hourly data from June 2018 through October 30, 2019 to review overall year over year changes in mobile web rates.
For rate of edits to article talk pages, we reviewed all mobile web and AMC tagged edits from the mediawiki_history table. Data reviewed was from August 2019 through October 2019.
#talk page views from logged in users from wmf.pageview_hourly
#dating back to June 2018 through October 2019.
query <- "
SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) AS date,
project,
access_method,
SUM(view_count) AS views
FROM wmf.pageview_hourly
WHERE ((year = 2018 AND month >= 06) or (year = 2019 and month <=10))
AND project LIKE '%.wikipedia'
AND namespace_id = 1
AND agent_type = 'user'
GROUP BY CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')), project, access_method"
talk_page_views <- wmf::query_hive(query)
talk_page_views$date <- as.Date(talk_page_views$date, format = "%Y-%m-%d")
#Total talk page requests from logged in users by request type - amc and non-amc edit
#Sample from the webrequest data.
query <-
"SELECT
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) AS date,
CONCAT(normalized_host.project, '.',normalized_host.project_family) AS wiki,
SUM(IF ((x_analytics_map['mf-m'] = 'b%2Camc' OR
x_analytics_map['mf-m'] = 'amc'), 1, 0)) as amc_request,
COUNT(*) AS all_mobile_web_requests
FROM wmf.webrequest TABLESAMPLE(BUCKET 1 OUT OF 64 ON hostname, sequence)
WHERE year = 2019 AND month >= 08
AND namespace_id = 1
-- Only review logged in users
AND normalized_host.project_family = 'wikipedia'
AND x_analytics_map['loggedIn'] IS NOT NULL
AND agent_type = 'user'
AND access_method = 'mobile web'
AND webrequest_source = 'text'
AND is_pageview
GROUP BY CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')), CONCAT(normalized_host.project, '.',normalized_host.project_family)"
talk_page_requests <- wmf::query_hive(query)
talk_page_requests$date <- as.Date(talk_page_requests$date, format = "%Y-%m-%d")
talk_page_requests_clean <- talk_page_requests %>%
mutate(other_mobile_web_request = all_mobile_web_requests - amc_request) %>%
gather(request_type, request_count, 3:5) %>%
arrange(desc(date))
talk_page_requests_clean$request_type %<>% factor(levels= c("amc_request","all_mobile_web_requests", "other_mobile_web_request"))
## All article talk page edits on mobile web coming from AMC mode
query <- "select
date_format(event_timestamp, 'yyyy-MM-dd') as date,
wiki,
sum(cast(other_mobile_web_edit as int)) as other_mobile_web_edits,
sum(cast(amc_edit as int)) as amc_edits,
sum(cast(all_mobile_web_edit as int)) as all_mobile_web_edits
from (
select
wiki_db as wiki,
event_timestamp,
(array_contains(revision_tags, 'mobile web edit') and not
array_contains(revision_tags, 'advanced mobile edit')) as other_mobile_web_edit,
(array_contains(revision_tags, 'advanced mobile edit') and
array_contains(revision_tags, 'mobile web edit')) as amc_edit,
array_contains(revision_tags, 'mobile web edit') as all_mobile_web_edit
from wmf.mediawiki_history mwh
INNER JOIN canonical_data.wikis cd
ON wiki_db = database_code
where
mwh.event_entity = 'revision' and
mwh.event_type = 'create' and
cd.database_group = 'wikipedia' and
mwh.event_timestamp IS NOT NULL and
mwh.event_user_id IS NOT NULL and
--looking only at historical page namespaces
page_namespace_historical == 1 and
mwh.event_timestamp between '2018-06-01' and '2019-10-31' and
mwh.snapshot = '2019-10'
) edits
group by wiki, date_format(event_timestamp, 'yyyy-MM-dd')"
talk_page_edits <- wmf::query_hive(query)
talk_page_edits$date <- as.Date(talk_page_edits$date, format = "%Y-%m-%d")
talk_page_edits_clean <- talk_page_edits %>%
filter(date < '2019-10-30') %>% #remove incomplete Nov data
gather(edit_type, edit_count, 3:5) %>%
arrange(desc(date))
talk_page_views_all <- talk_page_views %>%
filter(access_method == 'mobile web') %>%
mutate(date = floor_date(date, 'month')) %>%
group_by(date) %>%
summarise(views = sum(views))
# Plot YoY Changes in overall mobile web views
talk_page_views_all_yoy_plot <- talk_page_views_all %>%
mutate(year = case_when(date >= '2018-06-01' & date < '2019-06-01' ~ '2018/2019',
date >= '2019-06-01' & date < '2020-06-01' ~ '2019/2020'),
MonthN =as.factor(format(as.Date(date),"%m")),
Month = months(as.Date(date), abbreviate=TRUE))
talk_page_views_all_yoy_plot$MonthN = factor(talk_page_views_all_yoy_plot$MonthN, levels=c("06", "07", "08", "09", "10","11", "12", "01", "02", "03",
"04", "05" ))
talk_page_views_all_yoy_plot$year = factor(talk_page_views_all_yoy_plot$year, levels = c('2018/2019', '2019/2020'))
p <- ggplot(talk_page_views_all_yoy_plot, aes(x=MonthN, y = views, group = year, color = year, linetype = year)) +
geom_line(size = 0.8) +
scale_y_continuous("Number of views per month", labels = polloi::compress) +
scale_x_discrete(breaks = talk_page_views_all_yoy_plot$MonthN, labels = talk_page_views_all_yoy_plot$Month )+
geom_vline(xintercept = c(10, 1, 3), linetype = "dashed", color = "black") +
geom_text(aes(x=10, y=4E6, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.7, vjust = 1.5, angle = 90, color = "black") +
geom_text(aes(x=1, y=4E6, label="AMC deployed on Italian, Japanese, Persian, and Thai Wikipedias"), size=3.7, vjust = 1.5, angle = 90, color = "black") +
geom_text(aes(x=3, y=4E6, label="AMC deployed on all Wikipedias"), size=3.7, vjust = 1.5, angle = 90, color = "black") +
labs(title = "Mobile web talk page views on all Wikipedia Projects") +
xlab("Month") +
scale_color_brewer(palette = 'Set1', breaks=c('2018/2019', '2019/2020'), direction= -1) +
scale_linetype_manual(breaks=c('2018/2019', '2019/2020'), values=c(2,1)) +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(plot.title = element_text(hjust = 0.5),
legend.title = element_blank(),
legend.position = "bottom",
panel.grid = element_line("gray70"),
legend.key.width=unit(0.5,"cm"))
p
ggsave("Figures/mobile_web_talk_page_views_overall.png", p, width = 18, height = 9, units = "in", dpi = 150)
# Calculate year over year changes
talk_page_views_all_yoy <- talk_page_views_all%>%
mutate(date = floor_date(date, 'month')) %>%
group_by(date)%>%
summarise(views = sum(views)) %>%
arrange(date) %>%
mutate(yoy_percent = (views/lag(views,12) -1) *100) %>%
arrange(desc(date))
head(talk_page_views_all_yoy)
There was a decrease in the number of views to article talk pages on all Wikipedia Projects from January to May 2019; however, there was a sharp increase in the number of views starting in June 2019 with a 10.6% year over year increase in October 2019. Other variables may contributing to these increases but a review of the proportion of talk pages requests made while in AMC will help indicate how much AMC attributes to this increase.
talk_page_requests_all = talk_page_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(request_type == "all_mobile_web_requests",
date != '2019-08-18',
date != '2019-11-24') %>% #remove incomplete weeks
group_by(date) %>%
summarize(request_count = sum(request_count))
talk_page_requests_byrequest <- talk_page_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(request_type != "all_mobile_web_requests",
date != '2019-08-18',
date != '2019-11-24') %>% #remove incomplete weeks
group_by(date, request_type) %>%
summarise(request_count = sum(request_count))
p <- ggplot() +
geom_bar(data =talk_page_requests_byrequest, aes(x= date, y= request_count, fill = request_type),stat= "identity", position = "stack") +
geom_smooth(data = talk_page_requests_all, aes(x=date, y=request_count), se = FALSE) +
scale_y_continuous("Number of requests per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
labs(title = "Mobile web requests to talk pages on all Wikipedia Projects",
subtitle = "Based on a sample of data collected from webrequest data") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_talk_page_requests_amc_prop.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Calculate monthly proportion of amc_edits
talk_page_requests_monthly_prop <- talk_page_requests %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(amc_requests_total = sum(amc_request),
all_mobile_web_requests_total = sum(all_mobile_web_requests),
amc_prop = amc_requests_total/all_mobile_web_requests_total *100)
talk_page_requests_monthly_prop
Almost half of all views to talk pages are made by users in AMC mode, which is signficantly higher than the percentage of views to special pages.
There was a 10.6% year over year increase in the number of views to article talk pages in October 2019. About 42% of these views came from users in AMC mode.
talk_page_edits_all <- talk_page_edits_clean %>%
filter(edit_type == 'all_mobile_web_edits') %>%
mutate(date = floor_date(date, "week")) %>%
filter(date != '2018-05-27',
date != '2019-10-27') %>% #remove incomplete data weeks
group_by(date) %>%
summarise(edit_count = sum(edit_count))
#Overall Mobile Web Edits by edit type
talk_page_edits_bytype_all <- talk_page_edits_clean %>%
filter(edit_type != 'all_mobile_web_edits') %>%
mutate(date = floor_date(date, "week")) %>%
filter(date != '2018-05-27',
date != '2019-10-27') %>% #remove incomplete data weeks
group_by(date, edit_type) %>%
summarise(edit_count = sum(edit_count))
p <- ggplot() +
geom_col(data =talk_page_edits_bytype_all, aes(x= date, y= edit_count, fill = edit_type), stat= 'identity', position = "stack") +
geom_smooth(data = talk_page_edits_all, aes(x=date, y=edit_count), se = FALSE) +
scale_y_continuous("Number of edits per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
geom_vline(xintercept = vertical_lines,
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-03-20'), y=1.4E3, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.6, vjust = 1.5, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-06-17'), y=1.4E3, label="AMC deployed on Italian, Japanese, Persian, and Thai Wikipedias"), size=3.6, vjust = 1.5, angle = 90, color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=1.4E3, label="AMC deployed on all Wikipedias"), size=3.6, vjust = 1.5, angle = 90, color = "black") +
labs(title = "Mobile web edits to article talk pages on all Wikipedias") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_talk_page_edits_amc_prop.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Calculate monthly proportion of amc talk page edits
talk_page_edits_monthly_prop <- talk_page_edits %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(amc_edits_total = sum(amc_edits),
all_mobile_web_edits_total = sum(all_mobile_web_edits),
amc_prop = amc_edits_total/all_mobile_web_edits_total *100)
tail(talk_page_edits_monthly_prop)
#Calculate year over year changes in overall rate
talk_pages_edits_yoy <- talk_page_edits_clean %>%
filter(edit_type == "all_mobile_web_edits") %>% #Look at all mobile web requests
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(edit_count = sum(edit_count)) %>%
arrange(date) %>%
mutate(yearOveryear = (edit_count/lag(edit_count,12) -1)*100)
tail(talk_pages_edits_yoy)
The number of talk page edits on mobile web has been increasing for the past year so these increases cannot be solely attributed to AMC mode; however, higher year over year increases are seen starting in August 2019 around the date AMC was deployed to all Wikipedias.
From September 2019 to October 2019, the proportion of AMC edits increased by 19.5%. AMC edits accounted for 42.1% of all talk page edits made by logged-in users on mobile web in October 2019.
head(talk_page_edits_clean)
talk_page_views_all_enwiki <- talk_page_views %>%
filter(access_method == 'mobile web',
project == 'en.wikipedia') %>%
mutate(date = floor_date(date, 'month')) %>%
group_by(date) %>%
summarise(views = sum(views))
# Plot YoY Changes on English Wikipedia
talk_page_views_enwiki_yoy_plot <- talk_page_views_all_enwiki %>%
mutate(year = case_when(date >= '2018-06-01' & date < '2019-06-01' ~ '2018/2019',
date >= '2019-06-01' & date < '2020-06-01' ~ '2019/2020'),
MonthN =as.factor(format(as.Date(date),"%m")),
Month = months(as.Date(date), abbreviate=TRUE))
talk_page_views_enwiki_yoy_plot$MonthN = factor(talk_page_views_enwiki_yoy_plot$MonthN, levels=c("06", "07", "08", "09", "10","11", "12", "01", "02", "03",
"04", "05" ))
talk_page_views_enwiki_yoy_plot$year = factor(talk_page_views_enwiki_yoy_plot$year, levels = c('2018/2019', '2019/2020'))
p <- ggplot(talk_page_views_enwiki_yoy_plot, aes(x=MonthN, y = views, group = year, color = year, linetype = year)) +
geom_line(size = 0.8) +
scale_y_continuous("Number of views per month", labels = polloi::compress) +
scale_x_discrete(breaks = talk_page_views_all_yoy_plot$MonthN, labels = talk_page_views_all_yoy_plot$Month )+
geom_vline(xintercept = c(10, 1, 3), linetype = "dashed", color = "black") +
geom_text(aes(x=10, y=3E6, label="AMC deployed on Arabic, Indonesian, and Spanish Wikipedias"), size=3.6, vjust = 1.5, angle = 90, color = "black") +
geom_text(aes(x=1, y=3E6, label="AMC deployed on Italian, Japanese, Persian, and Thai Wikipedias"), size=3.6, vjust = 1.5, angle = 90, color = "black") +
geom_text(aes(x=3, y=3E6, label="AMC deployed on all Wikipedias"), size=3.6, vjust = 1.5, angle = 90, color = "black") +
labs(title = "Mobile web talk page views on English Wikipedia") +
xlab("Month") +
scale_color_brewer(palette = 'Set1', breaks=c('2018/2019', '2019/2020'), direction= -1) +
scale_linetype_manual(breaks=c('2018/2019', '2019/2020'), values=c(2,1)) +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(plot.title = element_text(hjust = 0.5),
legend.title = element_blank(),
legend.position = "bottom",
panel.grid = element_line("gray70"),
legend.key.width=unit(0.5,"cm"))
p
ggsave("Figures/mobile_web_talk_page_views_enwiki.png", p, width = 18, height = 9, units = "in", dpi = 150)
# Calculate year over year changes on English Wikipedia
talk_page_views_enwiki_yoy <- talk_page_views_all_enwiki %>%
mutate(date = floor_date(date, 'month')) %>%
group_by(date)%>%
summarise(views = sum(views)) %>%
arrange(date) %>%
mutate(yoy_percent = (views/lag(views,12) -1) *100) %>%
arrange(desc(date))
head(talk_page_views_enwiki_yoy)
Similar to trends seen across all Wikipedia projects, there was a decrease in the number of views to article talk pages on all English Wikipedia from January to May 2019; however, there was a sharp increase in the number of views starting in June 2019. Other variables may contributing to these increases and decreases but a review of the proportion of talk pages requests made while in AMC will help indicate how much AMC attributes to these views.
talk_page_requests_enwiki = talk_page_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(request_type == "all_mobile_web_requests",
wiki == 'en.wikipedia',
date != '2019-08-18',
date != '2019-11-24') %>% #remove weeks with incomplete data
group_by(date) %>%
summarize(request_count = sum(request_count))
talk_page_requests_enwiki_byrequest <- talk_page_requests_clean %>%
mutate(date = floor_date(date, "week")) %>%
filter(request_type != "all_mobile_web_requests",
wiki == 'en.wikipedia',
date != '2019-08-18',
date != '2019-11-24') %>% #remove weeks with incomplete data
group_by(date, request_type) %>%
summarise(request_count = sum(request_count))
p <- ggplot() +
geom_bar(data =talk_page_requests_enwiki_byrequest, aes(x= date, y= request_count, fill = request_type),stat= "identity", position = "stack") +
geom_smooth(data = talk_page_requests_enwiki , aes(x=date, y=request_count), se = FALSE) +
scale_y_continuous("Number of requests per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
labs(title = "Mobile web requests to talk pages on English Wikipedia",
subtitle = "Based on a sample of data collected from webrequest data") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_talk_page_request_amc_prop_enwiki.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Calculate monthly proportion of amc_edits on English Wiki
talk_page_requests_monthly_prop_enwiki <- talk_page_requests %>%
filter(wiki == 'en.wikipedia') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(amc_requests_total = sum(amc_request),
all_mobile_web_requests_total = sum(all_mobile_web_requests),
amc_prop = amc_requests_total/all_mobile_web_requests_total *100)
talk_page_requests_monthly_prop_enwiki
#Calculate month over month changes on English Wiki
talk_pages_requests_mom <- talk_page_requests_clean %>%
filter(wiki == 'en.wikipedia',
request_type == "all_mobile_web_requests",
date >= '2019-09-01',
date < '2019-11-01') %>% #Look at all mobile web requests
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(request_count = sum(request_count)) %>%
arrange(date) %>%
mutate(monthOvermonth= request_count/lag(request_count,1) -1)
tail(talk_pages_requests_mom)
From September 2019 to October 2019, the number of pageviews to talk pages on mobile web increased by 19.3%. About 41% of these views came from users in AMC mode.
talk_page_edits_enwiki <- talk_page_edits_clean %>%
filter(edit_type == 'all_mobile_web_edits',
wiki == 'enwiki') %>%
mutate(date = floor_date(date, "week")) %>%
filter(date != '2018-05-27',
date != '2019-10-27') %>% #remove incomplete data weeks
group_by(date) %>%
summarise(edit_count = sum(edit_count))
#Enwiki Mobile Web Edits by edit type
talk_page_edits_bytype_enwiki <- talk_page_edits_clean %>%
filter(edit_type != 'all_mobile_web_edits',
wiki == 'enwiki') %>%
mutate(date = floor_date(date, "week")) %>%
filter(date != '2018-05-27',
date != '2019-10-27') %>% #remove incomplete data weeks
group_by(date, edit_type) %>%
summarise(edit_count = sum(edit_count))
p <- ggplot() +
geom_col(data =talk_page_edits_bytype_enwiki, aes(x= date, y= edit_count, fill = edit_type), stat= 'identity', position = "stack") +
geom_smooth(data = talk_page_edits_enwiki, aes(x=date, y=edit_count), se = FALSE) +
scale_y_continuous("Number of edits per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
geom_vline(xintercept = as.Date('2019-08-07'),
linetype = "dashed", color = "black") +
geom_text(aes(x=as.Date('2019-08-07'), y=800, label="AMC deployed on all Wikipedias"), size=3.8, vjust = -1, angle = 90, color = "black") +
labs(title = "Mobile web edits to article talk pages \n on English Wikipedia") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_talk_page_edits_enwiki.png", p, width = 18, height = 9, units = "in", dpi = 150)
#Calculate monthly proportion of talk page edits on English Wikipedia
talk_page_edits_monthly_prop_enwiki <- talk_page_edits %>%
filter(wiki == 'enwiki') %>%
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(amc_edits_total = sum(amc_edits),
all_mobile_web_edits_total = sum(all_mobile_web_edits),
amc_prop = amc_edits_total/all_mobile_web_edits_total *100)
tail(talk_page_edits_monthly_prop_enwiki)
#Calculate year over year changes on English Wikipedia
talk_pages_edits_yoy_enwiki <- talk_page_edits_clean %>%
filter(wiki == 'enwiki',
edit_type == "all_mobile_web_edits") %>% #Look at all mobile web requests
mutate(date = floor_date(date, "month")) %>%
group_by(date) %>%
summarise(edit_count = sum(edit_count)) %>%
arrange(date) %>%
mutate(yearOveryear = (edit_count/lag(edit_count,12) -1) *100)
tail(talk_pages_edits_yoy_enwiki)
From September 2019 to October 2019, the proportion of AMC edits increased on English Wikipedia by 32.1% (compared to 19.5% across all Wikipedias). AMC edits accounted for 44.6% of all talk page edits made by logged-in users on mobile web in October 2019.
talk_page_views_all_targetwiki_March <- talk_page_views %>%
filter(access_method == 'mobile web',
project %in% c('ar.wikipedia', 'es.wikipedia', 'id.wikipedia')) %>%
mutate(date = floor_date(date, 'month')) %>%
group_by(date, project) %>%
summarise(views = sum(views))
# Plot YoY Changes on Target Wikipedia
talk_page_views_targetwiki_yoy_plot_March <- talk_page_views_all_targetwiki_March %>%
group_by(project) %>%
mutate(year = case_when(date >= '2018-06-01' & date < '2019-06-01' ~ '2018/2019',
date >= '2019-06-01' & date < '2020-06-01' ~ '2019/2020'),
MonthN =as.factor(format(as.Date(date),"%m")),
Month = months(as.Date(date), abbreviate=TRUE))
talk_page_views_targetwiki_yoy_plot_March$MonthN = factor(talk_page_views_targetwiki_yoy_plot_March$MonthN, levels=c("06", "07", "08", "09", "10","11", "12", "01", "02", "03",
"04", "05" ))
talk_page_views_targetwiki_yoy_plot_March$year = factor(talk_page_views_targetwiki_yoy_plot_March$year, levels = c('2018/2019', '2019/2020'))
p <- ggplot(talk_page_views_targetwiki_yoy_plot_March, aes(x=MonthN, y = views, group = year, color = year, linetype = year)) +
geom_line(size = 0.8) +
facet_wrap(~ project, nrow = 7, scale = "free_y") +
scale_y_continuous("Number of views per momth", labels = polloi::compress) +
scale_x_discrete(breaks = talk_page_views_targetwiki_yoy_plot_March$MonthN, labels = talk_page_views_targetwiki_yoy_plot_March$Month )+
labs(title = "Mobile web talk page views on target Wikipedia projects \n Where AMC was deploymed on March 20, 2019") +
xlab("Month") +
scale_color_brewer(palette = 'Set1', breaks=c('2018/2019', '2019/2020'), direction= -1) +
scale_linetype_manual(breaks=c('2018/2019', '2019/2020'), values=c(2,1)) +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(plot.title = element_text(hjust = 0.5),
legend.title = element_blank(),
legend.position = "bottom",
panel.grid = element_line("gray70"),
legend.key.width=unit(0.5,"cm"))
p
ggsave("Figures/mobile_web_talk_page_views_target_March.png", p, width = 18, height = 9, units = "in", dpi = 150)
talk_page_views_all_targetwiki_June <- talk_page_views %>%
filter(access_method == 'mobile web',
project %in% c('it.wikipedia', 'ja.wikipedia',
'fa.wikipedia', 'th.wikipedia')) %>%
mutate(date = floor_date(date, 'month')) %>%
group_by(date, project) %>%
summarise(views = sum(views))
# Plot YoY Changes on Target Wikipedia
talk_page_views_targetwiki_yoy_plot_June <- talk_page_views_all_targetwiki_June %>%
group_by(project) %>%
mutate(year = case_when(date >= '2018-06-01' & date < '2019-06-01' ~ '2018/2019',
date >= '2019-06-01' & date < '2020-06-01' ~ '2019/2020'),
MonthN =as.factor(format(as.Date(date),"%m")),
Month = months(as.Date(date), abbreviate=TRUE))
talk_page_views_targetwiki_yoy_plot_June$MonthN = factor(talk_page_views_targetwiki_yoy_plot_June$MonthN, levels=c("06", "07", "08", "09", "10","11", "12", "01", "02", "03",
"04", "05" ))
talk_page_views_targetwiki_yoy_plot_June$year = factor(talk_page_views_targetwiki_yoy_plot_June$year, levels = c('2018/2019', '2019/2020'))
p <- ggplot(talk_page_views_targetwiki_yoy_plot_June, aes(x=MonthN, y = views, group = year, color = year, linetype = year)) +
geom_line(size = 0.8) +
facet_wrap(~ project, nrow = 7, scale = "free_y") +
scale_y_continuous("Number of views per month", labels = polloi::compress) +
scale_x_discrete(breaks = talk_page_views_targetwiki_yoy_plot_June$MonthN, labels = talk_page_views_targetwiki_yoy_plot_June$Month )+
labs(title = "Mobile web talk page views \n on target Wikipedia projects \n Where AMC was deployed on June 17, 2019") +
xlab("Month") +
scale_color_brewer(palette = 'Set1', breaks=c('2018/2019', '2019/2020'), direction= -1) +
scale_linetype_manual(breaks=c('2018/2019', '2019/2020'), values=c(2,1)) +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(plot.title = element_text(hjust = 0.5),
legend.title = element_blank(),
legend.position = "bottom",
panel.grid = element_line("gray70"),
legend.key.width=unit(0.5,"cm"))
p
ggsave("Figures/mobile_web_talk_page_views_target_June.png", p, width = 18, height = 9, units = "in", dpi = 150)
There have also been year over year increases for most all of the target wikipedia projects except for Thai Wikipedia from June to October 2019 this year. These increases started prior to AMC deployment but there is a higher rate of increase following deployment and about 36.3% of all mobile web requests to article talk pages were done in AMC mode in October 2019.
talk_page_edits_targetwiki_March <- talk_page_edits_clean %>%
filter(edit_type == 'all_mobile_web_edits',
wiki %in% c('arwiki', 'eswiki', 'idwiki')) %>%
mutate(date = floor_date(date, "week")) %>%
filter(date != '2018-05-27',
date != '2019-10-27') %>% #remove incomplete data weeks
group_by(date, wiki) %>%
summarise(edit_count = sum(edit_count))
# March deployment target wiki Mobile Web Edits by edit type
talk_page_edits_bytype_targetwiki_March <- talk_page_edits_clean %>%
filter(edit_type != 'all_mobile_web_edits',
wiki %in% c('arwiki', 'eswiki', 'idwiki')) %>%
mutate(date = floor_date(date, "week")) %>%
filter(date != '2018-05-27',
date != '2019-10-27') %>% #remove incomplete data weeks
group_by(date, wiki, edit_type) %>%
summarise(edit_count = sum(edit_count))
p <- ggplot() +
geom_col(data =talk_page_edits_bytype_targetwiki_March, aes(x= date, y= edit_count, fill = edit_type), stat= 'identity', position = "stack") +
geom_smooth(data = talk_page_edits_targetwiki_March, aes(x=date, y=edit_count), se = FALSE) +
facet_wrap(~ wiki, nrow = 7, scale = "free_y") +
scale_y_continuous("Number of edits per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Mobile web edits to article talk pages on target Wikipedia \n Where AMC was deployed on March 20, 2019") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_talk_page_edits_target_March.png", p, width = 18, height = 9, units = "in", dpi = 150)
talk_page_edits_targetwiki_July <- talk_page_edits_clean %>%
filter(edit_type == 'all_mobile_web_edits',
wiki %in% c('itwiki', 'jawiki',
'fawiki', 'thwiki')) %>%
mutate(date = floor_date(date, "week")) %>%
filter(date != '2018-05-27',
date != '2019-10-27') %>% #remove incomplete data weeks
group_by(date, wiki) %>%
summarise(edit_count = sum(edit_count))
# July deployment target wiki Mobile Web Edits by edit type
talk_page_edits_bytype_targetwiki_July <- talk_page_edits_clean %>%
filter(edit_type != 'all_mobile_web_edits',
wiki %in% c( 'itwiki', 'jawiki',
'fawiki', 'thwiki')) %>%
mutate(date = floor_date(date, "week")) %>%
filter(date != '2018-05-27',
date != '2019-10-27') %>% #remove incomplete data weeks
group_by(date, wiki, edit_type) %>%
summarise(edit_count = sum(edit_count))
p <- ggplot() +
geom_col(data =talk_page_edits_bytype_targetwiki_July, aes(x= date, y= edit_count, fill = edit_type), stat= 'identity', position = "stack") +
geom_smooth(data = talk_page_edits_targetwiki_July, aes(x=date, y=edit_count), se = FALSE) +
facet_wrap(~ wiki, nrow = 7, scale = "free_y") +
scale_y_continuous("Number of edits per week", labels = polloi::compress) +
scale_x_date("Date", labels = date_format("%b %Y"), date_breaks = "1 month") +
labs(title = "Mobile web edits to article talk pages on target Wikipedias \n Where AMC was deployed on June 17, 2019") +
ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
theme(axis.text.x=element_text(angle = 45, hjust = 1),
panel.grid = element_line("gray70"),
legend.position="bottom",
legend.title=element_blank(),
legend.text=element_text(size=14))
p
ggsave("Figures/mobile_web_talk_page_views_target_June.png", p, width = 18, height = 9, units = "in", dpi = 150)
Summary of Mobile Web Talk Page Edits on Target Wikis - October 2019
Wiki | Proportion of AMC Edits | Month over Month Change in AMC Proportion | Year over Year Change |
---|---|---|---|
Spanish Wiki | 28.27% | 57.0% | 41.8% |
Arabic Wiki | 39.9% | 75.26% | 16.9% |
Indonesian Wiki | 13.2% | -39.1% | 58.5% |
Italian Wiki | 33.0% | 146.27% | 31.5% |
Japanese Wiki | 32.8% | -6.21% | 71.2% |
Persian Wiki | 12.4% | -47.23% | 45.2% |
Thai Wiki | 18.8% | 408.11% | 72.5% |
On target wikis, the rate of talk page edits has increased but not as sharply as seen on English Wikipedia. In addition, these increase start to occur prior to AMC deployment and are likely due to other variables.
There is also a much greater fluctuation in the proportion of talk page edits made in AMC mode. For some of the smaller sized wikis such as Thai Wikipedia, the proportion of AMC requests is highly influenced by only a few users and as a result there are greater fluctuations in the number of edits. In October 2019, the highest proportion of AMC edits are made by contributors on Arabic Wikipedia (39.9%) followed by Japanese, Italian, and Spanish Wikipedias.