Machine Translation Service Analysis Report 2024

Content Translation tool

Author

Krishna Chaitanya Velaga, Product Analytics

Published

October 22, 2024

Overview

Task: T370749

Content translation supports multiple machine translation services. When multiple options are available for a language, even if one is provided by default, users can use a different service. See default configuration and languages supported files for details on all available language pairs and defaults.

The purpose of the report is to understand the usage of MT services across various languages, and if needed inform changes to default service provided on certain language pairs. In addition to also understand translation quality of MT services, such as through modification percentage of the MT content and deletion rate of the articles. The previous iterations of the report have been run in May 2022, October 2022, and November 2023.

Methodology

For each machine translation service, we compared the following:

  • Percent of translations published by each machine translation service:
  • Overall across all languages
  • Daily usage trends
  • Usage at each Language Pair (Source - Target)
  • Most frequently used service for each target Language
  • Percent each machine translation service was modified by users
  • Percent of articles created with each machine translation service that were deleted
  • YoY comparision of service usage by language pair
  • Machine translation service usage by user edit count bucket

Data sources:

Period: All published translations from from January to September 2024 were considered for analysis.

Summary

Overall usage

  • Google’s translation service, which has been the most used translation service across all language pairs has been used for 71% of all all the published translations.
  • MinT (Machine in Translation), a Wikimedia Foundation hosted open-source machine translation service, is the second most used translation service accounting for 16% of all the published translations.
  • Compared to last year, the year-over-year growth in usage of MinT translation service is ~8%,1 w at the same time Google’s service usage reduced by ~9%.2
  • No machine translation was used (scratch) for 6% of the published translations using the Content Translation tool.

YoY comparison

  • On average, ~960 articles are translated and published daily using the Content Translation tool.
  • Compared to 2023, the median number of daily translations:
    • using Google reduced from 669 to 626 (6% decrease).
    • using MinT increased from 48 to 118 (145% increase).
    • for other services, the change is not more than 1-2%.

Language pairs where an optional service was used more or close to the default

  • There are 44 language pairs where an optional service (or no service, i.e., scratch) was used more or close to the default.
  • Among the language pairs where Google is the default:
    • For 20 language pairs, MinT was used more than Google.
    • For 10 language pairs, no service was used (i.e., scratch).
  • Among the language pairs where MinT is the default, for 10 language pairs, Google was used more.

Usage of service at each target language

  • MinT was used 100% for translating articles to nine languages.
  • MinT was used for 90% of all translations for translating articles to 18 languages (within the respective language).
  • MinT was used for the majority of the translations (>50% of the services) for translating articles to 18 languages.
  • Google was used 100% for translating articles to nine languages.
  • Google was used for 90% of all services for translating articles to 53 languages.
  • Google was used for the majority of the translations (>50% of the services) to 49 languages.
  • Only for translating articles to Aragonese, Apertium was used for 90% of the translations.
  • Only for translating articles to Chuvash, Yandex was used for 100% of the translations.
  • Only for translating articles to Basque (eu), Elia was the most used service (85% of all services).
  • Even in languages where LingoCloud is supported, the usage has been quite low. For Chinese (zh), it was used for ~2% of 4000+ translations, and for less than ~1% of 150+ translations to Wu Chinese (wuu).

Percentage of MT content modified by the user

  • The majority of translations across all MT services were modified by at least 10% at the time of publication.
  • For machine translation suggestions from MinT, 32% were modified by less than 10%—the highest of all services.
  • The percentage of translations with a human modification percentage between 10% and 50% for MinT and Google is 54%.
  • The percentage of translations with a human modification percentage higher than 50% is the least for MinT and Elia at 13% and 11%, respectively.
  • Apertium has the highest percentage of translations where the human modification percentage was more than 50%.

Deletion rates by MT service

  • Articles translated using MinT were deleted the least: 2.23% of the 30,000+ articles.
  • Yandex and Google had the highest percentage of deleted articles, with more than 3%.
  • 2.7% of the articles translated using Apertium were deleted.

Setup

Code
import pandas as pd
import duckdb
import numpy as np
import warnings

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick

import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
# import kaleido
import great_tables as gt

from IPython.display import display_html
from IPython.display import display, HTML
Code
init_notebook_mode(connected=True)

pd.options.display.max_columns = None
pd.options.display.max_rows = 100

bold = '\033[1m'
end = '\033[0m'
underline = '\033[4m'

# max width for plotly charts
iplot_width = 950

# always show options bar
iplot_config = {'displayModeBar': True}
Code
# connect to database
# conn = duckdb.connect('~/git/machine-translation-service-usage-analysis/data_gathering/secrets/mt_data.db')
conn = duckdb.connect('~/Desktop/GIT/wmf_git/machine-translation-service-usage-analysis/data_gathering/secrets/mt_data.db')
Code
def query(query_string, df=False, conn=conn):
    if df==True:
        return conn.sql(query_string).df()
    else:
        return conn.sql(query_string).show() 
Code
start_date = '2024-01-01'
end_date = '2024-09-30'
period_label = 'January to September 2024'

Overall usage by service

Code
mt_compare_overall = query(f"""
WITH base AS (
    SELECT 
        mt_service,
        COUNT(DISTINCT translation_id) AS n_translations
    FROM 
        mt_logs
    WHERE
        translation_start_time >= '{start_date}'
        AND translation_start_time <= '{end_date}'
        AND mt_service != 'undefined'
    GROUP BY
        mt_service
    ORDER BY
        n_translations DESC
)

SELECT
    mt_service AS 'Machine translation service',
    n_translations AS 'Number of Translations',
    n_translations / SUM(n_translations) OVER () AS 'Percent of all published translations'
FROM
    base    
""", True)
Code
mt_compare_overall_tbl = (
    gt
    .GT(mt_compare_overall, rowname_col='Machine translation service')
    .tab_header('Published translations by machine translations service across all language pairs', period_label)
    .fmt_percent(columns='Percent of all published translations')
    .data_color(columns='Percent of all published translations', palette='Greens')
    .opt_stylize()
)

mt_compare_overall_tbl
Published translations by machine translations service across all language pairs
January to September 2024
Number of Translations Percent of all published translations
Google 187953 71.20%
MinT 42644 16.15%
scratch 18126 6.87%
Apertium 6291 2.38%
Yandex 5076 1.92%
Elia 3779 1.43%
LingoCloud 110 0.04%
Summary
  • Google’s translation service, which has been the most used translation service across all language pairs has been used for 71% of all all the published translations.
  • MinT (Machine in Translation), a Wikimedia Foundation hosted open-source machine translation service, is the second most used translation service accounting for 16% of all the published translations.
  • Compared to last year, the year-over-year growth in usage of MinT translation service is ~8%,3 w at the same time Google’s service usage reduced by ~9%.4
  • No machine translation was used (scratch) for 6% of the published translations using the Content Translation tool.
Code
warnings.filterwarnings('ignore')

# bar chart
mt_compare_overall['Percent of all published translations'] *= 100

plt.figure(figsize=(10, 6))
ax = sns.barplot(data=mt_compare_overall, x='Machine translation service', y='Percent of all published translations', palette="BuGn_r")

# format y-axis as percetange
ax.yaxis.set_major_formatter(mtick.PercentFormatter(decimals=0))

# add data labels for bars
for p in ax.patches:
    ax.annotate(f'{p.get_height():.2f}%', (p.get_x() + p.get_width() / 2., p.get_height()), 
                ha='center', va='center', fontsize=8, fontweight='bold', color='black', 
                xytext=(0, 10), textcoords='offset points')

# plot and axes titles
plt.title(f"Published translations by machine translations service across all language pairs\n{period_label}")
plt.xlabel("Machine translation service", fontweight='bold')
plt.ylabel("Percent of all published translations", fontweight='bold')

plt.show()

Daily published translations

Translations published per day by each MT service were reviewed to identify any sudden increases or decreases in usage and to determine if those changes corresponded to deployments or setting changes that may have impacted MT availability.

Code
mt_daily = query(f"""
WITH base AS (
    SELECT 
        mt_service,
        translation_start_time AS date,
        translation_id
    FROM 
        mt_logs
    WHERE
        translation_start_time >= '{start_date}'
        AND translation_start_time <= '{end_date}'
        AND mt_service != 'undefined')

SELECT
    date,
    mt_service,
    COUNT(DISTINCT translation_id) AS n_translations
FROM
    base
GROUP BY
    date,
    mt_service
""", True)
Code
warnings.filterwarnings('ignore')
mt_daily = (
    mt_daily
    .replace([np.inf, -np.inf], np.nan)
    .dropna(subset=['date', 'n_translations'])
)
Code
mt_services = mt_daily.mt_service.unique()

# subplots
fig = sp.make_subplots(rows=4, cols=2, 
                       shared_xaxes=True, 
                       subplot_titles=mt_services,
                       x_title='Date',
                       y_title='Number of translations',
                       vertical_spacing=0.05, 
                       horizontal_spacing=0.05)

for i, mt_service in enumerate(mt_services):
    row, col = divmod(i, 2)
    service_data = mt_daily.query(f"mt_service == '{mt_service}'").sort_values('date')

    fig.add_trace(go.Scatter(x=service_data['date'], 
                             y=service_data['n_translations'], 
                             mode='lines',
                             name=mt_service,
                             showlegend=False, 
                             line=dict(color='DarkCyan')),
                  row=row+1, col=col+1)

fig.update_xaxes(range=[mt_daily['date'].min(), pd.to_datetime(end_date)])
fig.update_layout(title_text=f"Daily number of published translations created by MT Service<br>{period_label}",
                  title_x=0.5, height=800, width=iplot_width)

iplot(fig, config=iplot_config)
Note

From the above charts showing the daily number of published translations by service, we can observe that there is a spike around 24-26 June 2024 for most of the services. Also, there was an increase in MinT service usage between February and March, where on average 300-400 articles were published daily. We’ll further investigate these.

Rise in daily average of MinT usage during Feburary and March 2024

Code
mint_spike = query("""
WITH base AS (
    SELECT
        MONTH(translation_start_time) AS month,
        MONTHNAME(translation_start_time) AS month_name,
        target_wiki_db AS wiki_db,
        COUNT(DISTINCT target_revision_id) AS article_count
    FROM
        mt_logs
    WHERE
        mt_service = 'MinT'
    GROUP BY
        month,
        month_name,
        wiki_db
),

ranked AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY month
            ORDER BY article_count DESC
        ) AS rank
    FROM
        base
    WHERE
        article_count > 50
        AND month <= 4
)

SELECT * 
FROM ranked 
WHERE rank <= 3 
ORDER BY month, rank
""", True)
Code
mint_spike_tbl = (
    gt.GT(mint_spike, groupname_col='month_name', rowname_col='wiki_db')
    .tab_header('Top Wikis using MinT', 'Jan to Apr 2024')
    .cols_hide(['month', 'rank'])
    .data_color('article_count', palette='Greens')
    .cols_label(article_count='Total articles translated')
    .opt_stylize()
)

mint_spike_tbl
Top Wikis using MinT
Jan to Apr 2024
Total articles translated
January
ffwiki 250
kiwiki 80
minwiki 51
February
hawiki 2369
igwiki 1624
satwiki 362
March
hawiki 1541
urwiki 1114
igwiki 934
April
hawiki 413
uzwiki 332
urwiki 270
Note

The source of increase in average number of daily translations using MinT between February and March 2024 was from increased activity on Hausa, Urdu and Santali Wikipedias - likely due some article creation/translation campaign. For all these language, the default translation service from English is MinT.

Spike in daily translations during 24-26 June 2024

Code
spikes = {
    'Google': ['2024-03-06', '2024-03-07', '2024-06-25', '2024-06-26', '2024-06-27'],
    'Apertium': ['2024-05-11'],
    'MinT': ['2024-06-25', '2024-06-26', '2024-06-27'],
    'Yandex': ['2024-06-25', '2024-06-26', '2024-06-27'],
}
Code
june2024_spike = query("""
WITH base AS (
    SELECT
         target_wiki_db,
         mt_service,
         COUNT(DISTINCT target_revision_id) AS article_count
    FROM 
        mt_logs
    WHERE
        translation_start_time IN ('2024-06-25', '2024-06-26', '2024-06-27')
    GROUP BY
         translation_start_time,
         target_wiki_db,
         mt_service
    ORDER BY
        article_count DESC
),

ranked AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY mt_service
            ORDER BY article_count DESC
        ) AS rank
    FROM
        base
)

SELECT
    target_wiki_db AS Wikipedia,
    mt_service AS Service,
    article_count AS '# Articles'
FROM
    ranked
WHERE
    rank = 1
    AND article_count > 25
ORDER BY
    article_count DESC
""", True)

june2024_spike_tbl = (
    gt.GT(june2024_spike)
    .tab_header('Source of spike in MT usage', 'during 24-26 June 2024')
)

june2024_spike_tbl
Source of spike in MT usage
during 24-26 June 2024
Wikipedia Service # Articles
uzwiki MinT 2814
uzwiki Google 1681
uzwiki Yandex 88
uzwiki scratch 82
euwiki Elia 32
Note

The spike was caused due to increased activity on Uzbek Wikipedia, possibly due to a campaign. We can also observe this on Wikistats. During the period, MinT was the most used service, which is also the default service when translating from English to Uzbek.

Code
mt_daily_agg = (
    mt_daily
    .groupby('mt_service')
    .agg(
        avg_daily=('n_translations', np.mean),
        median_daily=('n_translations', np.median)
    )
    .sort_values('avg_daily', ascending=False)
)

mt_daily_agg_tbl = (
    gt
    .GT(mt_daily_agg.reset_index(), rowname_col='mt_service')
    .tab_header('Number of daily translations by service', period_label)
    .fmt_number(decimals=0)
    .cols_label(
        avg_daily='Average',
        median_daily='Median'
    )
    .opt_stylize()
)

mt_daily_agg_tbl
Number of daily translations by service
January to September 2024
Average Median
Google 686 626
MinT 156 118
scratch 66 58
Apertium 23 21
Yandex 19 17
Elia 14 12
LingoCloud 1 1
Summary
  • On average, ~960 articles are translated and published daily using the Content Translation daily.
  • Compared to 2023, the median number of daily translations:
    • using Google reduced from 669 to 626 (6% decrease).
    • using MinT increased from 48 to 118 (145% increase).
    • for other services, the change is not more than 1-2%.

Usage by language pair

The number and percentage of publications by each machine translation service at each language pair (i.e., source language and target language) were reviewed. Due to the large combination of language pairs, the data was saved to a Google Spreadsheet to easily filter and identify the percentage of publications by language pair for each machine translation service.

Code
mt_by_langpair = query(f"""
WITH base AS (
    SELECT
        source_language,
        target_language,
        mt_service,
        COUNT(DISTINCT translation_id) AS n_translations
    FROM 
        mt_logs
    WHERE
        translation_start_time >= '{start_date}' 
        AND translation_start_time <= '{end_date}'
        AND mt_service != 'undefined'
    GROUP BY
        source_language,
        target_language,
        mt_service
    ORDER BY
        source_language,
        target_language,
        n_translations
)

SELECT
    *,
    n_translations / SUM(n_translations) OVER (PARTITION BY source_language, target_language) AS pct_translations
FROM
    base
""", True)
Code
mt_by_langpair.to_csv('~/git/machine-translation-service-usage-analysis/data_gathering/secrets/mt_usage_langpair.tsv', sep='\t', index=False)

Higher use of optional service

Listing of language pairs where an optional service was used more or close to the default service.

Code
mt_defaults = query("""
WITH base AS (
    SELECT
        *,
        source_language||'-'||target_language AS pair
    FROM
        mt_by_langpair mt
)
    
SELECT 
    b.* EXCLUDE(mt_service),
    b.mt_service AS service_used,
    dfs.service AS default_service
FROM 
    base b
    JOIN mt_defaults dfs
    ON b.pair = dfs.pair
WHERE
    service_type = 'default_mt'
""", True)
Code
mt_optional_services_used_more = mt_defaults[
    (mt_defaults['service_used'] != mt_defaults['default_service']) & 
    (mt_defaults['pct_translations'] > 0.50) &
    (mt_defaults['n_translations'] > 10)
].sort_values('pair')

mt_optional_services_used_more_tbl = (
    gt
    .GT(mt_optional_services_used_more, groupname_col='default_service', rowname_col='pair')
    .tab_header('Languge pairs where an optional service was used more or close to the default', 'having at least 10 published translations; accouting for 50% or more translations')
    .fmt_percent(columns='pct_translations', decimals=0)
    .cols_label(
        source_language='Source',
        target_language='Target',
        n_translations='# Translations',
        pct_translations='% Translations',
        service_used='MT service used'
    )
    .opt_stylize(6)
)

mt_optional_services_used_more_tbl
Languge pairs where an optional service was used more or close to the default
having at least 10 published translations; accouting for 50% or more translations
Source Target # Translations % Translations MT service used
Google
ar-en ar en 123 95% scratch
ba-tt ba tt 17 89% MinT
bg-nl bg nl 11 52% scratch
de-en de en 27 100% scratch
en-ban en ban 24 96% MinT
en-din en din 13 81% MinT
en-ja en ja 101 53% scratch
en-min en min 19 90% MinT
en-new en new 47 96% MinT
en-pam en pam 13 81% MinT
en-shn en shn 11 100% MinT
en-ss en ss 80 94% MinT
en-tn en tn 78 80% MinT
en-tum en tum 12 92% MinT
en-ve en ve 44 90% MinT
en-war en war 22 88% MinT
fa-en fa en 131 80% scratch
fr-br fr br 13 62% MinT
he-en he en 163 87% scratch
id-ace id ace 18 95% MinT
id-ban id ban 13 93% MinT
id-min id min 60 97% MinT
it-en it en 14 88% scratch
it-fur it fur 122 98% MinT
it-vec it vec 26 96% MinT
ja-en ja en 14 100% scratch
lmo-it lmo it 13 100% MinT
pt-fo pt fo 11 100% MinT
simple-id simple id 18 69% scratch
tt-ba tt ba 40 62% Yandex
uk-be uk be 146 71% Yandex
zh-en zh en 21 51% scratch
MinT
en-as en as 109 69% Google
en-el en el 2119 62% Google
en-gu en gu 44 57% Google
en-hi en hi 722 56% Google
en-ig en ig 11144 66% Google
en-kn en kn 799 63% Google
en-or en or 605 89% Google
en-pa en pa 1393 52% Google
en-te en te 4149 66% Google
en-uz en uz 15350 61% Google
Apertium
es-en es en 62 54% scratch
mk-sr mk sr 14 61% Google
Summary
  • There are 44 language pairs where an optional service (or no service i.e. scratch) was used more or close to the default.
  • Among the language pairs where Google is the default:
    • Among 20 language pairs, MinT was used more than Google. The pairs are: ba-tt, en-ban, en-din, en-min, en-new, en-pam, en-shn, en-ss, en-tn, en-tum, en-ve, en-war, fr-br, id-ace, id-ban, id-min, it-fur, it-vec, lmo-it, and pt-fo.
    • Among 10 language pairs, no service used (i.e. scratch). The pairs are: ar-en, bg-nl, de-en, en-ja, fa-en, he-en, it-en, ja-en, simple-id, and zh-en.
  • Among the language pairs where MinT is the default, for 10 language pairs, Google was used more. The pairs are: en-as, en-el, en-gu, en-hi, en-ig, en-kn, en-or, en-pa, en-te, and en-uz.

Usage at each target language

Next, a closer look was taken at each machine translation service, identifying its usage at all target languages where available and determining the languages each service is helping to support the most.

Code
def chart_order_services(services, selected_service):
    services.remove(selected_service) 
    services.sort()
    services.insert(0, selected_service)
    return services
Code
def get_service_usage_by_target(service):
    
    # get usage information by selected service    
    service_usage_by_target = query(f"""
        WITH languages AS (
            SELECT 
                DISTINCT target_language
            FROM 
                mt_logs
            WHERE
                translation_start_time >= '{start_date}' 
                AND translation_start_time <= '{end_date}' 
                AND mt_service = '{service}'
                AND NOT mt_service = 'undefined'
        ),

        base AS (
            SELECT
                *
            FROM
                mt_logs
            WHERE
                target_language IN (SELECT target_language FROM languages)
                AND translation_start_time >= '{start_date}' 
                AND translation_start_time <= '{end_date}'
                AND NOT mt_service = 'undefined'
        ),

        agg AS (
            SELECT
                target_language,
                mt_service,
                COUNT(DISTINCT translation_id) AS n_translations
            FROM
                base
            GROUP BY
                target_language,
                mt_service
            ORDER BY
                target_language,
                n_translations DESC
        )
                    
        SELECT
            *,
            n_translations / SUM(n_translations) OVER (PARTITION BY target_language) AS pct_translations
        FROM
            agg
        """, True)
    
    return service_usage_by_target
Code
# plot to generate usage chart

def chart_usage(service, min_translations=10, min_percent=0.1, chart_height=750, chart_width=1400, xlabel_offset=0.025, return_fig=False):
    
    service_usage_by_target = get_service_usage_by_target(service)
    
    # top languages
    top_langs = (
        service_usage_by_target
        .query(f"""(mt_service == @service) & \
                    (n_translations >= {min_translations}) & \
                    (pct_translations > {min_percent})""")
        .sort_values(['pct_translations'], ascending=False)
        .target_language
        .values
        .tolist()
    )
    
    top_langs_usage = (
        service_usage_by_target
        .query("""target_language == @top_langs""")
        .assign(
            target_language=lambda df: pd.Categorical(
                df['target_language'], 
                categories=top_langs, 
                ordered=True),
            mt_service=lambda df: pd.Categorical(
                df['mt_service'], 
                categories=chart_order_services(df.mt_service.unique().tolist(), service), 
                ordered=True)
        )
        .sort_values(['target_language', 'mt_service'])
    )
    
    if service == 'scratch':
        chart_title = f'Languages where no MT service was used (scratch)'
    else:
        chart_title = f'Languages most supported by {service} (by percentage of published translations)'
    
    fig = px.bar(top_langs_usage, 
                 y='target_language', 
                 x='pct_translations', 
                 color='mt_service', 
                 orientation='h', 
                 height=chart_height, 
                 width=chart_width,
                 color_discrete_sequence=px.colors.qualitative.T10,
                 labels={
                     'target_language': 'Target language', 
                     'pct_translations': 'Percent of all published translations', 
                     'mt_service': 'MT service'
                 },
                 title=chart_title,
                 category_orders={'target_language': top_langs})
    
    fig.update_xaxes(tickformat=".0%")
    annotations = []
    
    # add data labels for the selected service only
    for _, row in top_langs_usage.iterrows():
        if row['mt_service'] == service:
            annotations.append(
                dict(
                    x=row['pct_translations'] - xlabel_offset,
                    y=row['target_language'],
                    text=f"{row['pct_translations']:.0%}",
                    showarrow=False,
                    font=dict(color="white")
                )
            )
            
    fig.update_layout(annotations=annotations)
    
    if return_fig:
        return fig
    else:
        fig.show()
Code
print(f'Available services: {mt_by_langpair.mt_service.unique().tolist()}')
Available services: ['Google', 'scratch', 'Yandex', 'MinT', 'Apertium', 'LingoCloud', 'Elia']

MinT

MinT (Machine in Translation) is a machine translation service based on open-source neural machine translation models. The service is hosted in the Wikimedia Foundation infrastructure, and it runs translation models that have been released by other organizations with an open-source license. MinT is designed to provide translations from multiple machine translation models. Initially, it uses the following models: NLLLB-200, OpusMT, IndicTrans2 and Softcatalà. From January to September 2024, MinT accounted for 16% of the all the published translations.

Code
iplot(chart_usage('MinT', chart_height=1500, chart_width=iplot_width, return_fig=True, min_translations=5), config=iplot_config)
Summary
  • MinT was used 100% for translating articles to the following languages (# 9): Buginese (bug), Banjar (bjn), Limburgish (li), Sicilian (scn), Kongo (kg), Shan (shn), Fiji Hindi (hif), Low Saxon (nds), and Cherokee (chr).
  • MinT was used for 90% of all services for translating articles to the following languages (# 18): Santali (sat), Fula (ff), Kashmiri (ks), Kikuyu (ki), Central Bikol (bcl), Friulian (fur), Crimean Tatar (crh), Newar (new), South Azerbaijani (azb), Minangkabau (min), Balinese (ban), Faroese (fo), Swati (ss), Tumbuka (tum), Gun (guw), Icelandic (is), Southern Sotho (st), and Zulu (zu).
  • MinT was used for majority of the translations (>50% of the services) to the following languages (# 18): Venda (ve), Waray (war), Lombard (lmo), Acehnese (ace), Tswana (tn), Kabyle (kab), Venetian (vec), Dinka (din), Kapampangan (pam), Tibetan (bo), Breton (br), Meitei (mni), Ligurian (lij), Malayalam (ml), Maithili (mai), Moroccan Arabic (ary), Fon (fon), and Dzongkha (dz).

Google

Code
iplot(chart_usage('Google', chart_height=2500, chart_width=iplot_width, return_fig=True), config=iplot_config)
Summary
  • Google was used 100% for translating articles to nine languages (# 9).5
  • Google was used for 90% of all services for translating articles to 53 languages.6
  • Google was used for majority of the translations (>50% of the services) to 49 languages7

Apertium

Code
iplot(chart_usage('Apertium', chart_height=500, chart_width=iplot_width, return_fig=True, min_translations=5), config=iplot_config)
Summary
  • Only for translating articles to Aragonese, Apertium was used for 90% of the translations.
  • Apertium was used for majority of the translations (>50% of the services) to 6 languages.8

Yandex

Code
iplot(chart_usage('Yandex', chart_height=500, chart_width=iplot_width, return_fig=True, min_translations=5), config=iplot_config)
Summary
  • Only for translating articles to Chuvash, Yandex was used for 100% of the translations.
  • Yandex was used for majority of the translations (>50% of the services) to 2 languages.9

Elia

Code
iplot(chart_usage('Elia', min_translations=5, min_percent=0.05, chart_height=350, chart_width=iplot_width, return_fig=True), config=iplot_config)
Summary

Only for translating articles to Basque (eu), Elia service was most used (85% of all services).

LingoCloud

Code
iplot(chart_usage('LingoCloud', min_translations=1, min_percent=0, chart_height=350, xlabel_offset=0.009, chart_width=iplot_width, return_fig=True), config=iplot_config)
Summary

Even in languages where LingoCloud is supported, the usage has been quite low. For Chinese (zh), it was used for ~2% of 4000+ translations, and less than ~1% of 150+ translations to Wu Chinese (wuu).

Percent MT content modified

The content provided by each machine translation service can be modified by the user before publishing. The analysis tracks the percentage each translation is modified by the user before publication. Following the MT abuse calculation documentation, a warning or error is displayed to the user based on the extent of unmodified content. This encourages users to make further edits. Depending on the situation, some users may still be able to publish their translations, but the resulting page may be added to a tracking category for potentially unreviewed translations, subject to community review. In other cases, users may not be allowed to publish.

For the purpose of this analysis, we have focused on published translations and have categorized the extent to which machine translation content was modified by users into three categories: less than 10%, between 10% and 50%, and over 50%. These categories can be adjusted as needed.

Method: Data on percent each translation is modified comes from the translations_progress field10 in the cx_translation table (as indicated by the human percentage stat).

Code
conn.sql(f"""
CREATE OR REPLACE VIEW hpct_modified AS
SELECT
    *,
    CASE 
        WHEN human_translated_percent < 0.1 THEN 'less than 10%'
        WHEN human_translated_percent >= 0.1 AND human_translated_percent <= 0.5 THEN 'between 10% and 50%'
        WHEN human_translated_percent >0.5 THEN 'over 50%'
    END AS 'pct_modified'
FROM
    mt_logs
WHERE
    translation_start_time >= '{start_date}'
    AND translation_start_time <= '{end_date}'
    AND mt_service != 'scratch'
    AND target_wiki_db != 'uzwiki'
""")
Code
pct_modified_overall = query("""
WITH base AS (
    SELECT
        mt_service,
        pct_modified,
        COUNT(DISTINCT translation_id) AS n_translations
    FROM
        hpct_modified
    GROUP BY
        mt_service,
        pct_modified
)
        
SELECT
    *,
    n_translations / SUM(n_translations) OVER (PARTITION BY mt_service) AS pct_translations
FROM
    base
""", True)
Code
pct_order = ['less than 10%', 'between 10% and 50%', 'over 50%']

# chart function
def chart_pct_modified(df, title, pct_order=pct_order, iplot_width=800):
    
    mt_service_order = (
        df
        .query("""pct_modified == 'less than 10%'""")
        .sort_values('pct_translations', ascending=False)
        .mt_service
        .unique()
        .tolist()
    )

    df['mt_service'] = pd.Categorical(df['mt_service'], categories=mt_service_order, ordered=True)
    df['pct_modified'] = pd.Categorical(df['pct_modified'], categories=pct_order, ordered=True)
    df = df.sort_values(['mt_service', 'pct_modified'], ascending=[True, True])

    fig = go.Figure()

    for pct_mod in pct_order:
        data = df[df['pct_modified'] == pct_mod]
        fig.add_trace(go.Bar(
            x=data['pct_translations'],
            y=data['mt_service'],
            orientation='h',
            name=pct_mod,
            text=[f"{val:.0%}" for val in data['pct_translations']],  # Format text labels as percentages
            textposition='auto',
            textfont_color='white',
            marker_color={pct: px.colors.qualitative.T10[i] for i, pct in enumerate(pct_order)}[pct_mod] 
        ))

    fig.update_layout(
        barmode='stack',
        height=600,
        width=iplot_width,
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="right",
            x=1
        ),
        xaxis_tickformat=".0%",
        title=title
    )

    return fig
Code
iplot(chart_pct_modified(pct_modified_overall, 'Percentage published translations modified by users'), config=iplot_config)
Summary

Note: uzwiki has been excluded as it was introducing a skew due to increased activity from a campaign.

  • Majority of the translations across all MT services were modified at least 10% during publication.
  • The machine translation suggestions from MinT, 32% were modified less than 10% - highest of all services.11
  • Percentage of translations with human modification percentage between 10% and 50% for MinT and Google is 54%.
  • Percentage of translations with human modification percentage higher than 50% is least for MinT and Elia at 13% and 11% respectively.
  • Apertium has the highest percentage of translations where the human modification percentage was more than 50%.

at MinT supported languages

Code
pct_modified_mint_by_lang = query("""
WITH
    mint_langs AS (
        SELECT
            target_language,
            COUNT(DISTINCT translation_id) AS n_translations
        FROM
            hpct_modified
        WHERE
            mt_service = 'MinT'
        GROUP BY
            target_language
    ),
    
    base AS (
        SELECT
            mt_service,
            pct_modified,
            target_language,
            COUNT(DISTINCT translation_id) AS n_translations
        FROM
            hpct_modified
        WHERE
            target_language IN (SELECT DISTINCT target_language FROM mint_langs WHERE n_translations >= 10)
        GROUP BY
            mt_service,
            pct_modified,
            target_language
        )
    
SELECT
    *,
    n_translations / SUM(n_translations) OVER (PARTITION BY target_language, mt_service) AS pct_translations
FROM
    base
""", True)
Code
pct_modified_mint_by_lang['pct_modified'] = pd.Categorical(pct_modified_mint_by_lang['pct_modified'], categories=pct_order, ordered=True)
pct_modified_mint_by_lang = pct_modified_mint_by_lang.sort_values(['pct_modified'], ascending=[True])
Code
color_mapping = {
    'less than 10%': px.colors.qualitative.T10[0],
    'between 10% and 50%': px.colors.qualitative.T10[1],
    'over 50%': px.colors.qualitative.T10[2]
}

unique_target_languages = pct_modified_mint_by_lang['target_language'].unique()
num_columns = 4
num_rows = (len(unique_target_languages) + num_columns - 1) // num_columns

fig = sp.make_subplots(
    rows=num_rows, 
    cols=num_columns, 
    subplot_titles=unique_target_languages, 
    horizontal_spacing=0.1, 
    vertical_spacing=0.005, 
    shared_xaxes=True, 
    specs=[[{}] * num_columns] * num_rows
)

all_traces = []

for i, target_language in enumerate(unique_target_languages):
    row_num = i // num_columns + 1
    col_num = i % num_columns + 1
    
    filtered_data = pct_modified_mint_by_lang.query(f"target_language == '{target_language}'")
    
    traces = []
    categories = filtered_data['pct_modified'].unique()

    for category in categories:
        category_data = filtered_data[filtered_data['pct_modified'] == category]
        
        trace = go.Bar(
            x=category_data['pct_translations'],
            y=category_data['mt_service'],
            orientation='h',
            name=category,
            marker=dict(color=color_mapping[category]),
            text=[f"{val:.0%}" if val > 0.2 else '' for val in category_data['pct_translations']],
            textposition='auto',
            textfont=dict(size=10)
        )
        
        traces.append(trace)
        
        
    for trace in traces:
        fig.add_trace(trace, row=row_num, col=col_num)

    
fig.update_layout(
    title="Percentage published translations modified by users at MinT supported languages",
    height=300 * num_rows,
    width=1050,
    barmode='stack',
    showlegend=False
)

for row_num in range(1, num_rows + 1):
    for col_num in range(1, num_columns + 1):
        fig.update_xaxes(tickformat=".0%", row=row_num, col=col_num)
Code
iplot(fig, config=iplot_config)

Deletion rate by MT service

Code
mt_deletion_overall = query("""
SELECT
    mtd.mt_service AS mt_service,
    SUM(created_cx_total)::INT AS '# Articles created',
    SUM(deleted_cx_total)::INT AS '# Articles deleted',
    SUM(deleted_cx_total) / SUM(created_cx_total)  AS 'Percentage of deleted articles',
    pct_translations AS 'Percent of translations modified under 10%'
FROM
    mt_deletion_ratios mtd
LEFT JOIN (
    SELECT
        mt_service,
        pct_translations
    FROM
        pct_modified_overall
    WHERE
        pct_modified = 'less than 10%'
    ) pct_modf
    ON mtd.mt_service = pct_modf.mt_service
WHERE
    wiki != 'uzwiki'
GROUP BY
    mtd.mt_service,
    pct_translations
""", True).sort_values('# Articles created', ascending=False)
Code
mt_deletion_overall_tbl = (
    gt
    .GT(mt_deletion_overall, rowname_col='mt_service')
    .tab_header('Percentage of articles deleted, by Machine Translation Service', f'{period_label}; excluding uzwiki')
    .fmt_percent(columns=['Percentage of deleted articles', 'Percent of translations modified under 10%'])
    .opt_stylize()
    .tab_source_note(gt.html('Note: As observed before, due to a campaign uzwiki, had a high creation and deletion rate,<br>skewing the overall percentage, so it has been excluded.<br>Please refer to the appendix for uzwiki aggregates.'))
)

mt_deletion_overall_tbl
Percentage of articles deleted, by Machine Translation Service
January to September 2024; excluding uzwiki
# Articles created # Articles deleted Percentage of deleted articles Percent of translations modified under 10%
Google 153991 5082 3.30% 29.86%
MinT 30320 675 2.23% 32.41%
scratch 16028 490 3.06%
Apertium 5781 156 2.70% 15.12%
Yandex 3925 126 3.21% 20.53%
Elia 3591 106 2.95% 15.74%
LingoCloud 78 4 5.13% 38.18%
Note: As observed before, due to a campaign uzwiki, had a high creation and deletion rate,
skewing the overall percentage, so it has been excluded.
Please refer to @mt-deletion-uzwiki-edit-bucket in the appendix for uzwiki aggregates.
Summary
  • Articles translated using MinT were deleted the least: 2.23% of the 30,000+ articles.
  • Yandex and Google are the top services. The percentage of articles deleted is more than 3%.
  • 2.7% of the articles translated using Apertium were deleted.

at MinT supported languages, by wiki

Code
mt_deletion_mint_langs = query("""
WITH mint_langs AS (
    SELECT
        COUNT(DISTINCT translation_id) AS n_translations,
        target_language||'wiki' AS wiki
    FROM 
        mt_logs
    WHERE 
        mt_service = 'MinT' 
    GROUP BY
        target_language
),
    
deletion_ratios AS (
    SELECT
        wiki,
        mt_service,
        SUM(created_cx_total) AS created_cx_total,
        SUM(deleted_cx_total) AS deleted_cx_total
    FROM 
        mt_deletion_ratios
    WHERE
        wiki IN (SELECT DISTINCT wiki FROM mint_langs WHERE n_translations > 15)
    GROUP By
        wiki,
        mt_service
)
            
SELECT 
    *,
    deleted_cx_total / (created_cx_total + deleted_cx_total) AS deletion_ratio
FROM 
    deletion_ratios
""", True)
Code
mt_deletion_mint_langs_min1 = mt_deletion_mint_langs.query("""deletion_ratio > 0.01""")
unique_wikis = mt_deletion_mint_langs_min1['wiki'].unique()

num_wikis = len(unique_wikis)
num_rows = (num_wikis + 2) // 3

fig = sp.make_subplots(
    rows=num_rows, 
    cols=3, 
    subplot_titles=unique_wikis, 
    horizontal_spacing=0.1/2, 
    vertical_spacing=0.035/2, 
    shared_yaxes=True
)

traces = []

for i, wiki in enumerate(unique_wikis):
    row_num = i // 3 + 1
    col_num = i % 3 + 1
    
    wiki_data = mt_deletion_mint_langs_min1[mt_deletion_mint_langs_min1['wiki'] == wiki].sort_values('deletion_ratio')
        
    trace = go.Bar(
        x=wiki_data['mt_service'],
        y=wiki_data['deletion_ratio'],
        name=wiki,
        text=[f"{val:.0%}" for val in wiki_data['deletion_ratio']],
        textposition='auto',
        textfont=dict(size=10),
        marker=dict(color='RoyalBlue')
    )
    
    traces.append(trace)
    
    fig.add_trace(trace, row=row_num, col=col_num)

fig.update_layout(
    title="Deletion Ratio by MT Service at MinT supported languages",
    height=200 * num_rows,
    width=950,
    showlegend=False
)

for row_num in range(1, num_rows + 1):
    for col_num in range(1, 4):
        fig.update_xaxes(row=row_num, col=col_num)
Code
iplot(fig, config=iplot_config)

MT service usage by user edit bucket

Code
mt_deletion_by_ueb = query("""
SELECT
    mt_service,
    user_editcount_bucket,
    SUM(created_cx_total) AS created_cx_total,
    SUM(deleted_cx_total) AS deleted_cx_total,
    SUM(deleted_cx_total) / SUM(created_cx_total) AS deletion_ratio
FROM
    mt_deletion_ratios
WHERE
    wiki != 'uzwiki'
GROUP BY
    mt_service,
    user_editcount_bucket
""", True)
Code
ordered_ueb = ['1-5', '6-99', '100-999', '1000-4999', '5000+']
mt_deletion_by_ueb['user_editcount_bucket'] = pd.Categorical(mt_deletion_by_ueb['user_editcount_bucket'], categories=ordered_ueb, ordered=True)
mt_deletion_by_ueb = mt_deletion_by_ueb.sort_values(['user_editcount_bucket', 'created_cx_total'], ascending=[True, False])
Code
mt_deletion_by_ueb_tbl = (
    gt
    .GT(mt_deletion_by_ueb, groupname_col='mt_service', rowname_col='user_editcount_bucket')
    .tab_header('Articles created and deleted by MT service & User Edit Count Bucket', period_label)
    .fmt_percent(columns='deletion_ratio', decimals=0)
    .fmt_number(columns=['created_cx_total', 'deleted_cx_total'], decimals=0)
    .cols_label(
        created_cx_total = '# Articles created',
        deleted_cx_total = '# Articles deleted',
        deletion_ratio = '% Articles deleted'
    )
    .opt_stylize()
    .tab_source_note(gt.html('Note: As observed before, due to a campaign uzwiki, had a high creation and deletion rate,<br>skewing the overall percentage, so it has been excluded.<br>Please refer to @tbl-mt-deletion-uzwiki-mt-service in the appendix for uzwiki aggregates.'))
)

mt_deletion_by_ueb_tbl
Articles created and deleted by MT service & User Edit Count Bucket
January to September 2024
# Articles created # Articles deleted % Articles deleted
Google
1-5 7,628 1,507 20%
6-99 11,453 1,345 12%
100-999 17,892 616 3%
1000-4999 24,101 409 2%
5000+ 92,917 1,205 1%
MinT
1-5 1,304 203 16%
6-99 2,752 177 6%
100-999 5,324 140 3%
1000-4999 5,381 119 2%
5000+ 15,559 36 0%
scratch
1-5 722 140 19%
6-99 1,199 129 11%
100-999 2,206 77 3%
1000-4999 4,543 112 2%
5000+ 7,358 32 0%
Apertium
1-5 339 66 19%
6-99 581 53 9%
100-999 373 13 3%
1000-4999 541 5 1%
5000+ 3,947 19 0%
Yandex
1-5 193 41 21%
6-99 284 57 20%
100-999 426 5 1%
1000-4999 418 10 2%
5000+ 2,604 13 0%
Elia
1-5 99 9 9%
6-99 202 4 2%
100-999 636 2 0%
1000-4999 587 0 0%
5000+ 2,067 91 4%
LingoCloud
1-5 13 1 8%
6-99 19 3 16%
100-999 21 0 0%
1000-4999 9 0 0%
5000+ 16 0 0%
Note: As observed before, due to a campaign uzwiki, had a high creation and deletion rate,
skewing the overall percentage, so it has been excluded.
Please refer to the appendix for uzwiki aggregates.
Summary
  • We wanted to understand how user experience affects the deletion rate of the articles created using CX, by machine translation service.
  • Except for Elia and LingoCloud:     - ~20% of the articles created by relatively newcomers (1-5 edit count bucket) were deleted.     - 12-15% of the articles created by users with 6-99 edits were deleted.     - 8-10% of the articles created by users with 100-999 edits were deleted.     - 2-4% of the articles created by users with 1000-4999 edits were deleted.     - 0-1% of the articles created by users 5000+ edits were deleted.
  • This shows that irrespective of the machine translation service used, user experience plays a huge role in the outcome of deletion of a translated article.
  • Also, the deletion percentage on Uzbek Wikipedia (Table 2), during the campaign was significantly higher than the usual, across all experience levels - which indicates that creating a lot of articles within a short period can lead to higher deletion rate.

Appendix

Machine translation deletion overview for uzwiki

Code
mt_deletion_overall_uzwiki = query("""
SELECT
    mtd.mt_service AS mt_service,
    SUM(created_cx_total)::INT AS '# Articles created',
    SUM(deleted_cx_total)::INT AS '# Articles deleted',
    SUM(deleted_cx_total) / SUM(created_cx_total) * 100  AS 'Percentage of deleted articles',
    pct_translations * 100 AS 'Percent of translations modified under 10%'
FROM
    mt_deletion_ratios mtd
LEFT JOIN (
    SELECT
        mt_service,
        pct_translations
    FROM
        pct_modified_overall
    WHERE
        pct_modified = 'less than 10%'
    ) pct_modf
    ON mtd.mt_service = pct_modf.mt_service
WHERE
    wiki == 'uzwiki'
GROUP BY
    mtd.mt_service,
    pct_translations
""", True).sort_values('# Articles created', ascending=False)

mt_deletion_overall_uzwiki.set_index('mt_service')
Table 1
# Articles created # Articles deleted Percentage of deleted articles Percent of translations modified under 10%
mt_service
Google 15915 919 5.774427 29.864008
MinT 8419 1567 18.612662 32.411278
Yandex 682 184 26.979472 20.534070
scratch 437 57 13.043478 NaN

MT service usage by user edit bucket on uzwiki

Code
mt_deletion_by_ueb_uzwiki = query("""
SELECT
    mt_service,
    user_editcount_bucket,
    SUM(created_cx_total) AS created_cx_total,
    SUM(deleted_cx_total) AS deleted_cx_total,
    SUM(deleted_cx_total) / SUM(created_cx_total) AS deletion_ratio
FROM
    mt_deletion_ratios
WHERE
    wiki == 'uzwiki'
GROUP BY
    mt_service,
    user_editcount_bucket
""", True)

mt_deletion_by_ueb_uzwiki['user_editcount_bucket'] = pd.Categorical(mt_deletion_by_ueb_uzwiki['user_editcount_bucket'], categories=ordered_ueb, ordered=True)
mt_deletion_by_ueb_uzwiki = mt_deletion_by_ueb_uzwiki.sort_values(['user_editcount_bucket', 'created_cx_total'], ascending=[True, False])

mt_deletion_by_ueb_uzwiki_tbl = (
    gt
    .GT(mt_deletion_by_ueb_uzwiki, groupname_col='mt_service', rowname_col='user_editcount_bucket')
    .tab_header('Articles created and deleted by MT service & User Edit Count Bucket', f'{period_label}; on uzwiki')
    .fmt_percent(columns='deletion_ratio', decimals=0)
    .fmt_number(columns=['created_cx_total', 'deleted_cx_total'], decimals=0)
    .cols_label(
        created_cx_total = '# Articles created',
        deleted_cx_total = '# Articles deleted',
        deletion_ratio = '% Articles deleted'
    )
    .opt_stylize()
)

mt_deletion_by_ueb_uzwiki_tbl
Table 2
Articles created and deleted by MT service & User Edit Count Bucket
January to September 2024; on uzwiki
# Articles created # Articles deleted % Articles deleted
Google
1-5 218 38 17%
6-99 1,482 288 19%
100-999 4,932 410 8%
1000-4999 6,471 180 3%
5000+ 2,812 3 0%
MinT
1-5 181 44 24%
6-99 1,685 488 29%
100-999 4,390 878 20%
1000-4999 2,027 157 8%
5000+ 136 0 0%
Yandex
1-5 26 7 27%
6-99 213 70 33%
100-999 308 83 27%
1000-4999 85 24 28%
5000+ 50 0 0%
scratch
1-5 12 0 0%
6-99 79 18 23%
100-999 228 36 16%
1000-4999 114 3 3%
5000+ 4 0 0%

Footnotes

  1. 8% of all translations in 2023, and 16% in 2024.↩︎

  2. 80% of all translations in 2023, and 71% in 2024.↩︎

  3. 8% of all translations in 2023, and 16% in 2024.↩︎

  4. 80% of all translations in 2023, and 71% in 2024.↩︎

  5. Chewa (ny), Malagasy (mg), Sundanese (su), Gan Chinese (gan), Hawaiian (haw), Aymara (ay), Lao (lo), Quechua (qu), Guarani (gn), Western Frisian (fy), Shona (sn), and Ilocano (ilo).↩︎

  6. Kinyarwanda (rw), Lingala (ln), Maltese (mt), Twi (tw), Sinhala (si), Serbian (sr), Tigrinya (ti), Khmer (km), Sindhi (sd), Irish (ga), Welsh (cy), Tagalog (tl), Marathi (mr), Macedonian (mk), Somali (so), Latin (la), Croatian (hr), Azerbaijani (az), Finnish (fi), Javanese (jv), Tsonga (ts), Bengali (bn), Czech (cs), Afrikaans (af), Lithuanian (lt), Bosnian (bs), Tajik (tg), Estonian (et), Albanian (sq), Xhosa (xh), Bulgarian (bg), Luganda (lg), Dutch (nl), Northern Sotho (nso), Belarusian (Taraškievica orthography) (be-tarask), Polish (pl), Mongolian (mn), Slovenian (sl), Hungarian (hu), Kazakh (kk), Thai (th), Central Kurdish (ckb), Tulu (tcy), Pashto (ps), Serbo-Croatian (sh), Armenian (hy), Burmese (my), Chinese (zh), Latvian (lv), Romanian (ro), Turkish (tr), Cebuano (ceb), Ukrainian (uk), and French (fr).↩︎

  7. Korean (ko), Slovak (sk), Danish (da), Russian (ru), Haitian Creole (ht), Odia (or), Yoruba (yo), Georgian (ka), Vietnamese (vi), Swahili (sw), Swedish (sv), Spanish (es), Italian (it), Hebrew (he), Amharic (am), Egyptian Arabic (arz), Betawi (bew), Dhivehi (dv), Kyrgyz (ky), Turkmen (tk), Uyghur (ug), Malay (ms), Arabic (ar), Maori (mi), Ewe (ee), Portuguese (pt), Oromo (om), Scottish Gaelic (gd), Kurdish (ku), Assamese (as), Persian (fa), Catalan (ca), Bhojpuri (bho), Yiddish (yi), Wu Chinese (wuu), Telugu (te), Igbo (ig), Greek (el), Kannada (kn), Uzbek (uz), Gujarati (gu), Luxembourgish (lb), Indonesian (id), Hindi (hi), Galician (gl), Sanskrit (sa), Norwegian Bokmål (nb), German (de), and Punjabi (pa).↩︎

  8. Norwegian Nynorsk (nn), Occitan (oc), Silesian (szl), Northern Sami (s), Asturian (ast), and Esperanto (eo).↩︎

  9. Bashkir (ba) and Belarusian (be).↩︎

  10. ^The translations_progress data shows the percentage of translation completion. human indicates manual translation percentage. mt indicates machine translation percentage. Any edits to machine translation output are considered as manual edits. The percentages are calculated at section level. any indicates the total translation (any=human+mt). Content Translation does not demand full translation of the source article.↩︎

  11. Note: Even though the percentage of translations with human modification percentage less 10% is 38% for LingoCloud, the number of translations that used LingoCloud was only ~100.↩︎