How many iOS app user are blocked from editing and registration?

https://phabricator.wikimedia.org/T214628

Summary

This analysis use the editing (MobileWikiAppEdit) and account creating (MobileWikiAppCreateAccount) event logging table to find out how many users got block error messages, thus we are only counting users who opt in to share their usage report with us -- the real number should be larger. We didn't see any error messages relating to block for Android on both tables, which may be a data bug on Android.

Editing Block

  • From Nov 2018 to now, on the iOS app, 71 unique IP addresses got IP block, 81 unique IP addresses got IP range block, at most 51 unique app install ID (83 unique IP) got ID block. For the rest of 137 blocked unique app install ID (275 unique IP), we cannot confirm their block type.
  • Among all 137 unknown block, 59 from the US, 19 from Japan. When breaking down by wiki, 84 from enwiki, 19 from jawiki, 8 from dewiki, 6 from nlwiki.
  • Among all 51 ID block, 19 from the US. When breaking down by wiki, 25 from enwiki, 4 from dewiki.
  • Among all 81 IP range block, 31 from the US, 23 from the UK. When breaking down by wiki, 59 from enwiki.
  • Among all 71 IP block, 17 from the US. When breaking down by wiki, 31 from enwiki, 10 from dewiki.

Caveat: Since we didn't send block ID/wiki user ID to the eventlogging table, we can only join the eventlogging data with Ipblocks table and Logging table by IP addresses to check whether an error message is an IP block. Therefore, the block type we're talking about here are defined as:

  • IP range block: Any users who see "Your IP address is in a range which has been blocked on all wikis. The block was made by User:XXX ..."
  • IP block: Users who see the message "You have been blocked from editing." and their IP addresses was in the Ipblocks table or the logging table (log type is block). Please note that this count is not very accurate but should be in the ballpark, since some tables in the queries are almost real-time, while others were only updated once a month.
  • ID block: Any users who see "Your IP address has been blocked automatically, because it was used by a blocked user.", or any logged in users who see the message "You have been blocked from editing.". This is an upper limit of the ID block counts, not an accurate number.
  • Unknown: Any other users who see the message "You have been blocked from editing."

Registration Block

  • From Nov 2018 to now, on the iOS app, 41 unique IP addresses (38 unique app install ID) got IP block when trying to register an account. Additionally, there are 347 unique IP addresses (332 unique app install ID) got error messages that contain the word "IP" but can't be confirmed as IP block -- we call them "Unconfirmed IP block" in this analysis.
  • Among the 347 unconfirmed IP block, 101 from the US, 43 from Japan, 32 from Germany, 32 from UK, 19 from India, 12 from Netherlands. When breaking down by wiki, 177 from enwiki, 41 from jawiki, 39 from trwiki, 19 from zhwiki, 17 from ruwiki, 15 from eswiki, 10 from dewiki.
  • Among the 41 IP block, 15 from the US. When breaking down by wiki, 25 from enwiki.

Caveat: Since the the error messages in MobileWikiAppCreateAccount table are translated, any message that contain the word "IP" will be taken as a IP block error message. Also, since we didn't send block ID/wiki user ID to the eventlogging table, we can only join the eventlogging data with Ipblocks table and Logging table by IP addresses to check whether an error message is an IP block.

In [1]:
%load_ext sql_magic

import findspark, os
os.environ['SPARK_HOME'] = '/usr/lib/spark2';
findspark.init()
import pyspark
import pyspark.sql
conf = pyspark.SparkConf()  # Use master yarn here if you are going to query large datasets.
sc = pyspark.SparkContext(conf=conf)
spark_hive = pyspark.sql.HiveContext(sc)

%config SQL.conn_name = 'spark_hive'

1. Editing block

Querying MobileWikiAppEdit table by parsing the localized error messages (all messages are in English).

Number of unique app install ID and unique IP address blocked from editing from Nov 2018 to now, by platform (Android vs iOS) and error messages

In [2]:
%%read_sql edit_block
select count(distinct coalesce(event.app_install_id, event.appInstallID)) as n_users,
count(distinct ip) as n_ip,
useragent.os_family, 
case when event.errortext = "You have been blocked from editing." then event.errortext
     when event.errortext = "Your IP address has been blocked automatically, because it was used by a blocked user." then event.errortext
     when event.errortext like "%Your IP address is in a range which has been blocked%" then "Your IP address is in a range which has been blocked."
     else "other" end as error_msg
from event.mobilewikiappedit
where (year=2019
or (year=2018 and month>10)
)
and event.action='error'
and event.errortext is not null
group by useragent.os_family, 
case when event.errortext = "You have been blocked from editing." then event.errortext
     when event.errortext = "Your IP address has been blocked automatically, because it was used by a blocked user." then event.errortext
     when event.errortext like "%Your IP address is in a range which has been blocked%" then "Your IP address is in a range which has been blocked."
     else "other" end
Query started at 06:07:04 PM UTC; Query executed in 0.59 m
Out[2]:
n_users n_ip os_family error_msg
0 1007 1218 Android other
1 227 425 iOS You have been blocked from editing.
2 4 4 iOS Your IP address has been blocked automatically...
3 214 270 iOS other
4 68 81 iOS Your IP address is in a range which has been b...

Android didn't see any error messages that contains the word "block"...

Number of unique app install ID and unique IP address blocked from editing on the iOS app from Nov 2018 to now, by block type

In [3]:
%%read_sql edit_block_by_type
with blocked_ip as (
    select distinct ip as blocked_ip
    from (
        select distinct log_title as ip
        from wmf_raw.mediawiki_logging
        where snapshot='2018-12'
        and log_type = "block"
        and log_action in ("block", "reblock")
        and log_namespace=2
        and substr(log_timestamp, 0, 4) > 2016
        union
        select distinct ipb_address as ip
        from wmf_raw.mediawiki_ipblocks
        where mediawiki_ipblocks.snapshot='2019-01'
    ) as tbl
)
select count(distinct coalesce(event.app_install_id, event.appInstallID)) as n_users,
count(distinct ip) as n_ip,
case when event.errortext like "%Your IP address is in a range which has been blocked%" then "IP range block"
     when event.errortext = "You have been blocked from editing." and blocked_ip is not null then "IP block"
     when (event.errortext = "You have been blocked from editing." and not event.anon) or event.errortext = "Your IP address has been blocked automatically, because it was used by a blocked user." then "ID block"
     else "Unknown" end as block_type
from event.mobilewikiappedit left join blocked_ip on mobilewikiappedit.ip=blocked_ip.blocked_ip
where (year=2019
or (year=2018 and month>10)
)
and useragent.os_family = "iOS"
and event.action='error'
and event.errortext like "%block%"
group by 
case when event.errortext like "%Your IP address is in a range which has been blocked%" then "IP range block"
     when event.errortext = "You have been blocked from editing." and blocked_ip is not null then "IP block"
     when (event.errortext = "You have been blocked from editing." and not event.anon) or event.errortext = "Your IP address has been blocked automatically, because it was used by a blocked user." then "ID block"
     else "Unknown" end
Query started at 06:07:40 PM UTC; Query executed in 8.95 m
Out[3]:
n_users n_ip block_type
0 137 275 Unknown
1 51 83 ID block
2 68 81 IP range block
3 55 71 IP block

Number of unique app install ID and unique IP address blocked from editing on the iOS app from Nov 2018 to now, by block type and countries

We only listed those that has 5+ IP addresses blocked.

In [4]:
%%read_sql edit_block_by_type_country
with blocked_ip as (
    select distinct ip as blocked_ip
    from (
        select distinct log_title as ip
        from wmf_raw.mediawiki_logging
        where snapshot='2018-12'
        and log_type = "block"
        and log_action in ("block", "reblock")
        and log_namespace=2
        and substr(log_timestamp, 0, 4) > 2016
        union
        select distinct ipb_address as ip
        from wmf_raw.mediawiki_ipblocks
        where mediawiki_ipblocks.snapshot='2019-01'
    ) as tbl
)
select count(distinct coalesce(event.app_install_id, event.appInstallID)) as n_users,
count(distinct ip) as n_ip,
geocoded_data['country'] as country,
case when event.errortext like "%Your IP address is in a range which has been blocked%" then "IP range block"
     when event.errortext = "You have been blocked from editing." and blocked_ip is not null then "IP block"
     when (event.errortext = "You have been blocked from editing." and not event.anon) or event.errortext = "Your IP address has been blocked automatically, because it was used by a blocked user." then "ID block"
     else "Unknown" end as block_type
from event.mobilewikiappedit left join blocked_ip on mobilewikiappedit.ip=blocked_ip.blocked_ip
where (year=2019
or (year=2018 and month>10)
)
and useragent.os_family = "iOS"
and event.action='error'
and event.errortext like "%block%"
group by geocoded_data['country'],
case when event.errortext like "%Your IP address is in a range which has been blocked%" then "IP range block"
     when event.errortext = "You have been blocked from editing." and blocked_ip is not null then "IP block"
     when (event.errortext = "You have been blocked from editing." and not event.anon) or event.errortext = "Your IP address has been blocked automatically, because it was used by a blocked user." then "ID block"
     else "Unknown" end
having n_ip > 5
order by n_ip desc
Query started at 06:16:37 PM UTC; Query executed in 9.04 m
Out[4]:
n_users n_ip country block_type
0 59 124 United States Unknown
1 19 52 Japan Unknown
2 19 36 United States ID block
3 25 31 United States IP range block
4 18 23 United Kingdom IP range block
5 0 18 Belgium Unknown
6 9 17 United States IP block
7 4 11 Philippines Unknown
8 7 10 Netherlands Unknown
9 7 8 United Kingdom Unknown
10 2 8 Switzerland ID block
11 6 7 Germany Unknown
12 6 6 Switzerland IP block
13 6 6 Japan IP range block

Number of unique app install ID and unique IP address blocked from editing on the iOS app from Nov 2018 to now, by block type and wiki

We only listed those that has 5+ IP addresses blocked. Please note that the value of wiki may not be the wiki of the article being edited -- we just fixed this bug in v6.2

In [5]:
%%read_sql edit_block_by_type_wiki
with blocked_ip as (
    select distinct ip as blocked_ip
    from (
        select distinct log_title as ip
        from wmf_raw.mediawiki_logging
        where snapshot='2018-12'
        and log_type = "block"
        and log_action in ("block", "reblock")
        and log_namespace=2
        and substr(log_timestamp, 0, 4) > 2016
        union
        select distinct ipb_address as ip
        from wmf_raw.mediawiki_ipblocks
        where mediawiki_ipblocks.snapshot='2019-01'
    ) as tbl
)
select count(distinct coalesce(event.app_install_id, event.appInstallID)) as n_users,
count(distinct ip) as n_ip,
wiki,
case when event.errortext like "%Your IP address is in a range which has been blocked%" then "IP range block"
     when event.errortext = "You have been blocked from editing." and blocked_ip is not null then "IP block"
     when (event.errortext = "You have been blocked from editing." and not event.anon) or event.errortext = "Your IP address has been blocked automatically, because it was used by a blocked user." then "ID block"
     else "Unknown" end as block_type
from event.mobilewikiappedit left join blocked_ip on mobilewikiappedit.ip=blocked_ip.blocked_ip
where (year=2019
or (year=2018 and month>10)
)
and useragent.os_family = "iOS"
and event.action='error'
and event.errortext like "%block%"
group by wiki,
case when event.errortext like "%Your IP address is in a range which has been blocked%" then "IP range block"
     when event.errortext = "You have been blocked from editing." and blocked_ip is not null then "IP block"
     when (event.errortext = "You have been blocked from editing." and not event.anon) or event.errortext = "Your IP address has been blocked automatically, because it was used by a blocked user." then "ID block"
     else "Unknown" end
having n_ip > 5
order by n_ip desc
Query started at 06:25:39 PM UTC; Query executed in 9.56 m
Out[5]:
n_users n_ip wiki block_type
0 84 161 enwiki Unknown
1 46 59 enwiki IP range block
2 19 53 jawiki Unknown
3 25 38 enwiki ID block
4 20 31 enwiki IP block
5 6 27 nlwiki Unknown
6 4 11 dewiki ID block
7 10 10 dewiki IP block
8 8 9 dewiki Unknown
9 8 8 zhwiki IP range block
10 7 7 frwiki IP block
11 1 6 angwiki ID block
12 6 6 zhwiki Unknown
13 4 6 itwiki ID block

2. Registration block

Querying MobileWikiAppCreateAccount table by parsing the localized error messages. Since the the error messages here are translated, the matching is fuzzy, i.e. any message that contain the word "IP" will be taken as a IP block message.

Number of unique app install ID and unique IP address blocked from registration from Nov 2018 to now, by platform (Android vs iOS) and error messages

In [6]:
%%read_sql registration_block
with blocked_ip as (
    select distinct ip as blocked_ip
    from (
        select distinct log_title as ip
        from wmf_raw.mediawiki_logging
        where snapshot='2018-12'
        and log_type = "block"
        and log_action in ("block", "reblock")
        and log_namespace=2
        and substr(log_timestamp, 0, 4) > 2016
        union
        select distinct ipb_address as ip
        from wmf_raw.mediawiki_ipblocks
        where mediawiki_ipblocks.snapshot='2019-01'
    ) as tbl
)
select count(distinct coalesce(event.app_install_id, event.appInstallID)) as n_users,
count(distinct ip) as n_ip,
useragent.os_family, 
case when event.errortext like "%IP%" and blocked_ip is null then "Unconfirmed IP block"
     when event.errortext like "%IP%" and blocked_ip is not null then "IP block"
     else "other" end as error_msg
from event.mobilewikiappcreateaccount left join blocked_ip on mobilewikiappcreateaccount.ip=blocked_ip.blocked_ip
where (year=2019
or (year=2018 and month>10)
)
and event.action='error'
and event.errortext is not null
group by useragent.os_family, 
case when event.errortext like "%IP%" and blocked_ip is null then "Unconfirmed IP block"
     when event.errortext like "%IP%" and blocked_ip is not null then "IP block"
     else "other" end
Query started at 06:35:13 PM UTC; Query executed in 9.43 m
Out[6]:
n_users n_ip os_family error_msg
0 332 347 iOS Unconfirmed IP block
1 0 894 Android other
2 38 41 iOS IP block
3 5679 6051 iOS other

Android's n_users is 0 because didn't send any app install ID to the eventlogging table...

Number of unique app install ID and unique IP address blocked from registration on the iOS app from Nov 2018 to now, by block type and countries

We only listed those that has 5+ IP addresses blocked.

In [7]:
%%read_sql registration_block_by_type_country
with blocked_ip as (
    select distinct ip as blocked_ip
    from (
        select distinct log_title as ip
        from wmf_raw.mediawiki_logging
        where snapshot='2018-12'
        and log_type = "block"
        and log_action in ("block", "reblock")
        and log_namespace=2
        and substr(log_timestamp, 0, 4) > 2016
        union
        select distinct ipb_address as ip
        from wmf_raw.mediawiki_ipblocks
        where mediawiki_ipblocks.snapshot='2019-01'
    ) as tbl
)
select count(distinct coalesce(event.app_install_id, event.appInstallID)) as n_users,
count(distinct ip) as n_ip,
geocoded_data['country'] as country,
IF(blocked_ip is not null, "IP block", "Unconfirmed IP block") as block_type
from event.mobilewikiappcreateaccount left join blocked_ip on mobilewikiappcreateaccount.ip=blocked_ip.blocked_ip
where (year=2019
or (year=2018 and month>10)
)
and useragent.os_family = "iOS"
and event.action='error'
and event.errortext like "%IP%"
group by geocoded_data['country'], 
IF(blocked_ip is not null, "IP block", "Unconfirmed IP block")
having n_ip > 5
order by n_ip desc
Query started at 06:44:38 PM UTC; Query executed in 9.42 m
Out[7]:
n_users n_ip country block_type
0 99 101 United States Unconfirmed IP block
1 42 43 Japan Unconfirmed IP block
2 33 32 Germany Unconfirmed IP block
3 29 32 United Kingdom Unconfirmed IP block
4 18 19 India Unconfirmed IP block
5 14 15 United States IP block
6 12 12 Netherlands Unconfirmed IP block
7 8 9 Hong Kong Unconfirmed IP block
8 9 9 Ukraine Unconfirmed IP block
9 7 8 Philippines Unconfirmed IP block
10 8 8 Turkey Unconfirmed IP block
11 7 7 China Unconfirmed IP block
12 6 6 United Kingdom IP block
13 6 6 Singapore Unconfirmed IP block

Number of unique app install ID and unique IP address blocked from registration on the iOS app from Nov 2018 to now, by block type and wiki

We only listed those that has 5+ IP addresses blocked.

In [8]:
%%read_sql registration_block_by_type_wiki
with blocked_ip as (
    select distinct ip as blocked_ip
    from (
        select distinct log_title as ip
        from wmf_raw.mediawiki_logging
        where snapshot='2018-12'
        and log_type = "block"
        and log_action in ("block", "reblock")
        and log_namespace=2
        and substr(log_timestamp, 0, 4) > 2016
        union
        select distinct ipb_address as ip
        from wmf_raw.mediawiki_ipblocks
        where mediawiki_ipblocks.snapshot='2019-01'
    ) as tbl
)
select count(distinct coalesce(event.app_install_id, event.appInstallID)) as n_users,
count(distinct ip) as n_ip,
wiki,
IF(blocked_ip is not null, "IP block", "Unconfirmed IP block") as block_type
from event.mobilewikiappcreateaccount left join blocked_ip on mobilewikiappcreateaccount.ip=blocked_ip.blocked_ip
where (year=2019
or (year=2018 and month>10)
)
and useragent.os_family = "iOS"
and event.action='error'
and event.errortext like "%IP%"
group by wiki, 
IF(blocked_ip is not null, "IP block", "Unconfirmed IP block")
having n_ip > 5
order by n_ip desc
Query started at 06:54:03 PM UTC; Query executed in 9.14 m
Out[8]:
n_users n_ip wiki block_type
0 165 177 enwiki Unconfirmed IP block
1 40 41 jawiki Unconfirmed IP block
2 36 39 trwiki Unconfirmed IP block
3 22 25 enwiki IP block
4 20 19 zhwiki Unconfirmed IP block
5 17 17 ruwiki Unconfirmed IP block
6 15 15 eswiki Unconfirmed IP block
7 11 10 dewiki Unconfirmed IP block
8 8 9 nlwiki Unconfirmed IP block
9 7 8 frwiki Unconfirmed IP block