Mobile Entry Points Funnel Analysis

Section Translation

Author

Krishna Chaitanya Velaga, Product Analytics

Published

January 5, 2023

Introduction (T290428)

Section translation is an expansion of the Content Translation capabilities. Section translation enables users to expand existing Wikipedia articles by translating new sections. In addition, Section Translation is designed to work on mobile devices (in addition to desktop), which enables users to translate that was not possible with Content Translation before.

The content translation events capture various aspects of user interactions with the content and section translation tools. This analysis is the first iteration of visualizing how users arrive through various entry points, flows, and how many reach the currently instrumented next stages.

90 days of data preceding 2023-12-31 was reviewed.

Overall Summary

Frequency of Entry Points Usage (by Edit Count Bucket of Users)

85% of the newcomers opened the translation dashboard by navigating from frequent language selector, which surfaces missing languages to translate for an article.
As users gain more editing experience, more users tend to reach the dashboard increasingly through content language selector, which they can search for missing language to translate for an article.
Also, experienced users tend to open the dashboard directly as compared to newcomers.
For users with 1000+ edits, frequent languages selector is only 40% of the time to navigate to the dashboard.

Frequency of Entry Points Usage (by comparative size of target language Wikipedia)

On larger Wikipedias, frequent languages selector was most used to navigate to the translation dashboard.
- Among the top 20 Wikipedias, it was used 85% of the time to access the dashboard.
On smaller Wikipedias, although frequent language selector remains the most accessed, usage of content language selector is more compared to larger Wikipedia.
This is related to the observations from the user edit bucket, larger Wikipedias tend to have more newcomers compared to smaller Wikipedias.

Translation Start Screen

Only 7% of the dashboard_translation_start occurred independently of dashboard_open events.
- That indicates that most of the users start the translations by already selecting an article/section to translate from an external entry point.
Among the ones who initiate dashboard_translation_start independently
- Majority of the newcomers start a translation by accepting suggestions by the API in the absence of a seed article.
- Majority of the experienced users start a translation by choosing the results of a search, followed by accepting a translation suggested because it is related to one of their recent edits.

Flow of Users Overall

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

In most cases (77%), those who opened the dashboard transitioned to the translation start screen.
- This is because for users navigating to the dashboard from an external entry point, both events occur consecutively.
- 13% ended the session and 8% refreshed the dashboard or came back to it later before the session expired.
Among the users who proceeded to the start screen, only in 15% of the cases they progressed to the editor and made an edit.
- In 46% of the cases, users went back to the main dashboard, and 30% ended the session.
- As most of the events were generated by users with 0 edits (newcomers), this is largely influenced by those events.
Among users who made at least one edit, in 80% of the cases, they continued to make additional edits, while 9% went back to the main dashboard, and the rest ended the session.

Flow of Users (by User Edit Count Bucket)

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

Across all edit count buckets, most of the users (>70%) who opened the dashboard proceeded to the translation start screen.
- The percentage is higher for newcomers compared to experienced users. This is because most newcomers reach the dashboard through external entry points rather than directly opening the dashboard, in which case, both dashboard_open and dashboard_translation_start are consecutively triggered (with no user action in between), whereas, among experienced users, more users open the dashboard directly and then click to proceed translation start screen.
Among users who reached the translation start screen
- Newcomers tend to end/abandon the session or return to the main dashboard
- Only in 12% of the cases, newcomers continued to make an edit from this stage, whereas users with 1000+ made an edit in 32% of the cases.
  - With higher the editing experience, the more likely that users will continue to make an edit
Among users who made at least one, with increasing editing experience, the more likely that users will continue to make additional edits to the machine-translated content, and less likely to end the session or return to the dashboard.

Flow of Users (by Usage of Entry Points)

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

The rate of transition between various stages of the funnel by the source of entry is highly correlated to the usage of the respective entry point by various user experience levels.

Among users who navigated through frequent languages menu, which was most frequently accessed by newcomers:
- In 82% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 13% of the cases.
Among users who navigated through content language sector, which was frequently accessed by both newcomers and experienced users alike:
- In 75% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 23% of the cases.
Among users who directly opened the dashboard, most frequently by experienced users:
- In 36% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 34% of the cases.
Among users who navigated from an invitation shown on a non-existent page, which was frequently accessed by both newcomers and experienced users alike:
- In 80% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 14% of the cases.
Among users who directly opened the dashboard with link to specific translation, most frequently by experienced users:
- In 87% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 11% of the cases.
Among users who navigated from contributions page, which was frequently accessed by both newcomers and experienced users alike:
- In 39% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 7% of the cases.
Among users who navigated from notice on recently translated articles to review/expand the translation, most frequently by experienced users:
- In 88% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 31% of the cases.

Flow of Users (other events)

In cases where users discarded a suggested translation (677 occurrences), in 80% of the cases they continued to discard the next translation show as well, and 10% proceeded to the translation start screen.
In cases where users requested that the list of suggestions be regenerated (110 occurrences), in 33% of the cases they refreshed the suggestions again, and 20% proceeded to the translation start screen.
In cases where users initiated a search (958 occurrences), in 82% of the cases they proceeded to the translation start screen, and 12% returned to the dashboard.
In cases where users selected an in-progress translation (440 occurrences), in 67% of the cases they returned to the dashboard, and 16% made an edit to the translation.
In cases where users discarded an in-progress translation (132 occurrences), in 58% of the cases they discarded additional in-progress translations, and 13% initiated a search.

Data Gathering

Setup

Code

import wmfdata as wmf
import pandas as pd
from datetime import datetime, timedelta
import great_tables as gt

import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot

from IPython.display import display_html, display, HTML, clear_output, Markdown

import warnings

Code

init_notebook_mode(connected=True)

pd.options.display.max_columns = None
pd.options.display.max_rows = 250

# width for charts
iplot_width = 950
max_width = 1250

# always show options bar
iplot_config = {'displayModeBar': True}

# prints a string at center of the output, bold if needed
def pr_centered(content, bold=False):
    if bold:
        content = f"<b>{content}</b>"
    
    centered_html = f"<div style='text-align:center'>{content}</div>"
    
    display(HTML(centered_html))

Code

spark_session = wmf.spark.get_active_session()

if type(spark_session) == type(None):
    spark_session = wmf.spark.create_custom_session(
        master="yarn",
        app_name='cx-funnel-entrypoints',
        spark_config={
            "spark.driver.memory": "4g",
            "spark.dynamicAllocation.maxExecutors": 64,
            "spark.executor.memory": "16g",
            "spark.executor.cores": 4,
            "spark.sql.shuffle.partitions": 256,
            "spark.driver.maxResultSize": "2g"
        }
    )

spark_session.sparkContext.setLogLevel("ERROR")

clear_output()

spark_session

SparkSession - hive

SparkContext

Spark UI

Version: v3.1.2
Master: yarn
AppName: cx-funnel-entrypoints

Query

Code

end_dt = '2023-12-31'
start_dt = (datetime.strptime(end_dt, "%Y-%m-%d") - timedelta(days=90)).strftime("%Y-%m-%d")

Code

%%time

query = """
SELECT
    dt AS ts,
    DATE(dt) AS dt,
    HOUR(dt) AS hour,
    wiki_db,
    access_method,
    content_translation_session_id,
    content_translation_session_position,
    event_type,
    event_source,
    translation_type,
    translation_source_language,
    translation_target_language,
    user_is_anonymous,
    user_global_edit_count_bucket,
    year,
    day,
    month
FROM 
    event_sanitized.mediawiki_content_translation_event
WHERE
    DATE(dt) >= DATE('{START_DT}')
    AND DATE(dt) <= DATE('{END_DT}')
"""

all_events = wmf.spark.run(
    query.format(
        START_DT=start_dt, 
        END_DT=end_dt
    )
)

CPU times: user 4.41 s, sys: 723 ms, total: 5.14 s
Wall time: 2min 8s

Code

edit_buckets_across_all_sessions = (
    all_events[['content_translation_session_id', 'user_global_edit_count_bucket']]
    .user_global_edit_count_bucket
    .value_counts(normalize=True)
    .reset_index()
    .rename({
        'user_global_edit_count_bucket': 'Edit Bucket',
        'proportion': 'Percentage of events'
    }, axis=1)
    .sort_values('Percentage of events', ascending=False, ignore_index=True)
)

edit_buckets_across_all_sessions['Percentage of events'] = edit_buckets_across_all_sessions['Percentage of events'].apply(lambda x:f"{x:.2%}")
pr_centered(f'Distribution of edit buckets across all sessions', True)
edit_buckets_across_all_sessions

Distribution of edit buckets across all sessions

	Edit Bucket	Percentage of events
0	0 edits	53.93%
1	1000+ edits	19.20%
2	5-99 edits	10.76%
3	100-999 edits	9.87%
4	1-4 edits	6.24%

Data Cleaning

During analysis, several issues related to the events produced were identified. The most significant issue was with content_translation_session_position where multiple events belong to different and same event types although occurred at different times, have the same session position. Currently, we are not sure whether the session position was being recorded incorrectly, in which it can be re-constructed based on the timestamp, or if they are duplicate events. More information and task to investigate these issues are at T353882. For this analysis, all sessions with potentially erroneous events will not be considered.

Code

temporal_columns = ['ts', 'dt', 'hour']

# sessions with duplicate events expect for the temporal columns
sessions_with_duplicate_events = (
    all_events[[col for col in all_events.columns.tolist() if col not in temporal_columns]]
    .value_counts()
    .reset_index()
    .rename({0: 'count'}, axis=1)
    .query("""count > 1""")
    .content_translation_session_id
    .unique()
    .tolist()
)

# various event types in a session having same session position althoguh the events occured later
session_event_counts = (
    all_events.groupby(['content_translation_session_id', 'content_translation_session_position'])
    .agg(distinct_events=('event_type', pd.Series.nunique))
)

sessions_with_same_position_events = (
    session_event_counts.query("""distinct_events > 1""")
    .reset_index()
    .content_translation_session_id
    .unique()
    .tolist()
)

# sessions where multiple global edit count buckets were recorded
sessions_with_multiple_edit_counts = (
    all_events.groupby('content_translation_session_id')['user_global_edit_count_bucket']
    .nunique()
    .reset_index()
    .query("""user_global_edit_count_bucket > 1""")
    .content_translation_session_id
    .unique()
    .tolist()
)

# sessions with no dashboard open at start
sessions_with_no_dopen_start = (
    all_events.query("""(content_translation_session_position == 0) & (event_type != 'dashboard_open')""")
    .content_translation_session_id
    .unique()
    .tolist()
)

sessions_with_dopen = (
    all_events
    .query("""event_type == 'dashboard_open'""")['content_translation_session_id']
    .unique()
    .tolist()
)

# sessions without dashboard open
sessions_without_dopen = (
    all_events
    .query("""content_translation_session_id != @sessions_with_dopen""")['content_translation_session_id']
    .unique()
    .tolist()
)

# sessions with multiple events having same session position
duplicate_events_with_same_position = (
    all_events[['content_translation_session_id', 'content_translation_session_position', 'event_type']]
    .value_counts()
    .reset_index()
    .rename({0: 'count'}, axis=1)
    .query("""count > 1""")
    .content_translation_session_id
    .unique()
    .tolist()
)

Code

# remove all potentially invalid sessions
invalid_sessions = list(
    set(
        [*sessions_with_duplicate_events,
         *sessions_with_same_position_events,
         *sessions_with_multiple_edit_counts, 
         *sessions_with_no_dopen_start, 
         *sessions_without_dopen, 
         *duplicate_events_with_same_position]
    )
)

events = all_events.query("""content_translation_session_id != @invalid_sessions""")

Code

# ensure session positions follows timestamp; fix if needed
def is_session_position_consistent(group):
    return group['content_translation_session_position'].is_monotonic_increasing

events = events.sort_values(by=['content_translation_session_id', 'ts'])
consistency_check = events.groupby('content_translation_session_id').apply(is_session_position_consistent)

assert len(consistency_check[consistency_check == False].index.tolist()) == 0, \
    f'{len(consistency_check[consistency_check == False])} sessions have inconsistent position'

Code

# change to appropriate datatypes

edit_buckets = ['0 edits', '1-4 edits', '5-99 edits', '100-999 edits', '1000+ edits']

events = (
    events
    .assign(
        user_global_edit_count_bucket=pd.Categorical(events['user_global_edit_count_bucket'], categories=edit_buckets, ordered=True),
        ts=pd.to_datetime(events['ts'], utc=True)
    )
    .sort_values(by=['content_translation_session_id', 'content_translation_session_position'])
    .reset_index(drop=True)
)

print('Dataframe Information')
events.info()

Dataframe Information
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 87874 entries, 0 to 87873
Data columns (total 17 columns):
 #   Column                                Non-Null Count  Dtype              
---  ------                                --------------  -----              
 0   ts                                    87874 non-null  datetime64[ns, UTC]
 1   dt                                    87874 non-null  object             
 2   hour                                  87874 non-null  int32              
 3   wiki_db                               87874 non-null  object             
 4   access_method                         87874 non-null  object             
 5   content_translation_session_id        87874 non-null  object             
 6   content_translation_session_position  87874 non-null  int64              
 7   event_type                            87874 non-null  object             
 8   event_source                          62045 non-null  object             
 9   translation_type                      87874 non-null  object             
 10  translation_source_language           87872 non-null  object             
 11  translation_target_language           87874 non-null  object             
 12  user_is_anonymous                     87874 non-null  bool               
 13  user_global_edit_count_bucket         87874 non-null  category           
 14  year                                  87874 non-null  int64              
 15  day                                   87874 non-null  int64              
 16  month                                 87874 non-null  int64              
dtypes: bool(1), category(1), datetime64[ns, UTC](1), int32(1), int64(4), object(9)
memory usage: 9.9+ MB

Code

n_all_sessions = all_events.content_translation_session_id.nunique()
n_all_events = all_events.shape[0]

n_valid_sessions = events.content_translation_session_id.nunique()
n_events_from_valid_sessions = events.shape[0]

pct_invalid_sessions = 100 - round(n_valid_sessions / n_all_sessions * 100, 2) 

print(f'- all sessions: {n_all_sessions}; all events: {n_all_events}')
print(f'- valid sessions: {n_valid_sessions}; events from valid sessions: {n_events_from_valid_sessions}')
print(f'- percentage of sessions with potentially erroneous events: {pct_invalid_sessions}%')

- all sessions: 29365; all events: 277212
- valid sessions: 15143; events from valid sessions: 87874
- percentage of sessions with potentially erroneous events: 48.43%

Analysis: Entry Points & Sources

Dashboard Open

As the goal is to understand how users reach the translation dashboard, this part of the analysis only includes events where users navigate to main dashboard from an external source. For example, after adding a segement, a user can come back to dashboard for another translation, and these events are currently being recorded as direct acess (T353799), such dashboard_open events are not considered for this part.

Overall

Code

# dashboard open events

dopen_events = (
    events
    .query("""(event_type == 'dashboard_open') & (content_translation_session_position == 0)""")
    .reset_index(drop=True)
)

dopen_edit_counts = (
    dopen_events[['content_translation_session_id', 'user_global_edit_count_bucket']]
    .user_global_edit_count_bucket
    .value_counts(normalize=True)
    .reset_index()
    .rename({
        'user_global_edit_count_bucket': 'Edit Bucket',
        'proportion': 'Percentage'
    }, axis=1)
    .sort_values('Edit Bucket')
)

dopen_edit_counts_table = (
    gt
    .GT(dopen_edit_counts)
    .fmt_percent(columns='Percentage')
    .tab_header(title='Frequency of Users\' Edit Buckets', subtitle='that initiated dashboard open events')
    .tab_source_note(f'across {dopen_events.content_translation_session_id.nunique()} sessions from {start_dt} to {end_dt}')
)

dopen_edit_counts_table

Frequency of Users' Edit Buckets
that initiated dashboard open events
Edit Bucket	Percentage
0 edits	49.25%
1-4 edits	8.35%
5-99 edits	15.37%
100-999 edits	9.57%
1000+ edits	17.46%
across 15007 sessions from 2023-10-02 to 2023-12-31

Code

# frequency of entry points usage

entry_points_freq = (
    dopen_events.event_source
    .value_counts(normalize=True)
    .reset_index()
    .rename({
        'event_source': 'entry_point',
        'proportion': 'percent'        
    }, axis=1)
)

# generate table from dataframe
entry_points_table = (
    gt
    .GT(entry_points_freq)
    .fmt_percent(columns='percent')
    .cols_label(
        entry_point='Entry Point',
        percent='Percentage'
    )
    .tab_header('Overall Distribution Of Entry Points That Users Navigate From', 'To Reach the Content Translation dashboard')
    .tab_source_note(f'across {dopen_events.content_translation_session_id.nunique()} sessions from {start_dt} to {end_dt}')
)

entry_points_table

Overall Distribution Of Entry Points That Users Navigate From
To Reach the Content Translation dashboard
Entry Point	Percentage
frequent_languages	72.69%
content_language_selector	15.94%
direct	6.37%
invite_new_article_creation	3.22%
direct_preselect	1.33%
contributions_page	0.25%
recent_translation	0.21%
across 15007 sessions from 2023-10-02 to 2023-12-31

by Edit Bucket

Code

warnings.filterwarnings('ignore')

# usage of entry points by various edit buckets

entry_by_edit_bucket = (
    dopen_events
    .groupby(['user_global_edit_count_bucket', 'event_source'])
    .size()
    .reset_index()
    .rename({
        'user_global_edit_count_bucket': 'edit_bucket',
        'event_source': 'source',
        0: 'count'
    }, axis=1)
    .sort_values(['edit_bucket', 'count'], ascending=[True, False])
    .reset_index(drop=True)
)

# total by each edit bucket
entry_by_edit_bucket['total'] = (
    entry_by_edit_bucket['edit_bucket']
    .map(entry_by_edit_bucket
         .groupby('edit_bucket')
         .agg({'count': sum})
         .to_dict()['count']
    )
)

# percantage of usage by edit bucket
entry_by_edit_bucket = entry_by_edit_bucket.astype({'total': int})
entry_by_edit_bucket['percent'] = entry_by_edit_bucket['count'] / entry_by_edit_bucket['total']

Code

# only display annonations if entry point accounts for more than 5%
entry_by_edit_bucket['percent_annot'] = (
    entry_by_edit_bucket['percent']
    .apply(lambda x:f"{x:.0%}" if x > 0.05 else None)
)

# bar graph
fig = px.bar(entry_by_edit_bucket, 
             x='percent', 
             y='edit_bucket', 
             color='source',
             labels={
                 'percent':'% of Total Events', 
                 'edit_bucket': 'Edit Bucket', 
                 'source': 'Entry Points'
             },
             color_discrete_sequence=px.colors.qualitative.T10,
             title='Usage of Entry Points by User Global Edit Bucket',
             text='percent_annot', 
             # display in increasing edit bucket order
             category_orders={
                 'edit_bucket': edit_buckets, 
                 'source': entry_points_freq.entry_point.values.tolist()
             }
            )

# relative stacks the bars
fig.update_layout(barmode='relative', height=550, width=max_width)
fig.update_xaxes(tickformat='.0%')
fig = fig.update_traces(
    textfont_color='white', 
    hovertemplate="<br>".join([
        "Edit Bucket: %{y}",
        "Percent of Total Events: %{x:.0%}"
    ])
)

iplot(fig, config=iplot_config)

Summary

85% of the newcomers opened the translation dashboard by navigating from frequent language selector, which surfaces missing languages to translate for an article.
As users gain more editing experience, more users tend to reach the dashboard increasingly through content language selector, which they can search for missing language to translate for an article.
Also, experienced users tend to open the dashboard directly as compared to newcomers.
For users with 1000+ edits, frequent languages selector is only 40% of the time to navigate to the dashboard.

By Wiki size: Target Language

comparative sizes are based on wiki-comparison data

Code

# wiki comparision data
wiki_comp = pd.read_csv('https://raw.githubusercontent.com/wikimedia-research/wiki-comparison/main/data-collection/snapshots/Jan_2023.tsv', sep='\t')
wp_comp = (
    wiki_comp[wiki_comp['project code'] == 'wikipedia']
    .reset_index(drop=True)
    .reset_index()[['index', 'database code', 'language code', 'language name', 'monthly active editors']]
    .rename({
        'index': 'rank', 
        'database code': 'db_code', 
        'language code': 'lang_code', 
        'language name': 'lang_name',
        'monthly active editors': 'active_editors'
    }, axis=1)
)

wp_comp['rank'] = wp_comp['rank'] + 1

rank_bin_edges = [0, 5, 10, 20, 50, float('inf')]
rank_bin_labels = ['1-5', '6-10', '11-20', '21-50', '51-max']

wp_comp['rank_bin'] = pd.cut(
    wp_comp['rank'], 
    bins=rank_bin_edges, 
    labels=rank_bin_labels
)

# add wiki comparision data to dashboard open events
dopen_events = (
    dopen_events
    .merge(
        wp_comp[['lang_code', 'rank_bin']],
        how='left',
        left_on='translation_target_language',
        right_on='lang_code'
    )
    .rename(columns={'rank_bin': 'target_wp_rank'})
    .drop('lang_code', axis=1)
)

Code

# usage of entry points by translation target language wiki size
entry_by_target_wp_size = (
    dopen_events
    .groupby(['target_wp_rank', 'event_source'])
    .size()
    .reset_index()
    .rename({
        'event_source': 'source',
        0: 'count'
    }, axis=1)
    .sort_values(['target_wp_rank', 'count'], ascending=[True, False])
    .reset_index(drop=True)
)

# total by each rank
entry_by_target_wp_size['total'] = (
    entry_by_target_wp_size['target_wp_rank']
    .map(entry_by_target_wp_size
         .groupby('target_wp_rank')
         .agg({'count': sum})
         .to_dict()['count']
    )
)

entry_by_target_wp_size = entry_by_target_wp_size.astype({'total': int})
entry_by_target_wp_size['percent'] = entry_by_target_wp_size['count'] / entry_by_target_wp_size['total']

Code

# only display annonations if entry point accounts for more than 5%
entry_by_target_wp_size['percent_annot'] = (
    entry_by_target_wp_size['percent']
    .apply(lambda x:f"{x:.0%}" if x > 0.05 else None)
)

# bar graph
fig = px.bar(entry_by_target_wp_size.query("""target_wp_rank != '1-5'"""), 
             x='percent', 
             y='target_wp_rank', 
             color='source',
             labels={
                 'percent':'% of Total Events', 
                 'target_wp_rank': 'Target Language WP Size', 
                 'source': 'Entry Points'
             },
             color_discrete_sequence=px.colors.qualitative.T10,
             title='Usage of Entry Points by Comparitive Wikipedia Size (of the Target Language)',
             text='percent_annot', 
             category_orders={
                 'target_wp_rank': [i for i in rank_bin_labels if i != '1-5'], 
                 'source': entry_points_freq.entry_point.values.tolist()
             }
            )


# stack the bars
fig.update_layout(barmode='relative', height=550, width=max_width)
fig.update_xaxes(tickformat='.0%')
fig.update_traces(textfont_color='white')

iplot(fig, config=iplot_config)

Summary

On larger Wikipedias, frequent languages selector was most used to navigate to the translation dashboard.
- Among the top 20 Wikipedias, it was used 85% of the time to access the dashboard.
On smaller Wikipedias, although frequent language selector remains the most accessed, usage of content language selector is more compared to larger Wikipedia.
This is related to the observations from the user edit bucket, larger Wikipedias tend to have more newcomers compared to smaller Wikipedias.

Translation Start

The next step after opening the translation dashboard is the translation start page, which appears after a user confirms their choice of article/section to translate. This step occurs before the translation editing screen. In this section, various sources through which users reach the translation start page have been analyzed. This step can take place in two scenarios:

When users from an external source navigate to the dashboard (i.e. entry points such as frequent languages and content language selector), the opening of the translation dashboard is immediately followed by the translation start screen. In such cases, the event_source for dashboard_translation_start will be the same as the source for dashboard_open. For example, if a user clicks on a link from the frequent languages selector, dashboard_open and dashboard_translation_start events are consecutively triggered, with both having event source as frequent_languages. This is because the selection of the article/section has already happened.
When users reach the main dashboard either by directly opening, or returning after editing/completing a translation, there are multiple ways users are shown suggestions, and upon selection, sources specific to dashboard_translation_start get logged.

For this section, only events generated from the second scenario are considered, as the first scenario is caused due to the sources of dashboard_open.

Code

# filter translation start events that occurred independently of dashboard open
dopen_sources = events.query("""event_type == 'dashboard_open'""").event_source.unique().tolist()
dtstart_self_events = events.query("""(event_type == 'dashboard_translation_start') & (event_source != @dopen_sources)""")
dstart_sources_freq = dtstart_self_events.event_source.value_counts()
pct_dopen_independent_events = round(dtstart_self_events.shape[0] / events.query("""(event_type == 'dashboard_translation_start')""").shape[0] * 100, 2)

Code

# frequency of sources for dashboard translation start
dstart_sources_freq_by_bucket = (
    dtstart_self_events[['event_source', 'user_global_edit_count_bucket']]
    .value_counts()
    .reset_index()
    .rename({
        'count': 'n_events',
        'event_source': 'Source',
        'user_global_edit_count_bucket': 'Edit Bucket'
    }, axis=1)
    .sort_values(['Source', 'Edit Bucket'])
)

dstart_sources_freq_by_bucket['source_total_events'] = dstart_sources_freq_by_bucket['Source'].map(dstart_sources_freq.to_dict())
dstart_sources_freq_by_bucket['Percentage'] = (dstart_sources_freq_by_bucket['n_events'] / dstart_sources_freq_by_bucket['source_total_events']).apply(lambda x:f"{x:.2%}")

dstart_sources_freq_by_bucket_tbl = dstart_sources_freq_by_bucket.pivot(index='Edit Bucket', columns='Source', values='Percentage').fillna(0).to_markdown()
display(Markdown(dstart_sources_freq_by_bucket_tbl))

Edit Bucket	continue_published	search_result	suggestion_nearby	suggestion_no_seed	suggestion_recent_edit
0 edits	0	8.13%	32.26%	55.61%	1.06%
1-4 edits	0	4.88%	22.58%	8.42%	3.72%
5-99 edits	70.00%	8.48%	29.03%	16.63%	22.87%
100-999 edits	0	12.43%	16.13%	8.84%	17.55%
1000+ edits	30.00%	66.09%	0	10.50%	54.79%

Summary

Only 7% of the dashboard_translation_start occurred independently of dashboard_open events.
- That indicates that most of the users start the translations by already selecting an article/section to translate from an external entry point.
Among the ones who initiate dashboard_translation_start independently
- Majority of the newcomers start a translation by accepting suggestions by the API in the absence of a seed article.
- Majority of the experienced users start a translation by choosing the results of a search, followed by accepting a translation suggested because it is related to one of their recent edits.

Analysis: User Flows (Funnel)

For the majority of the funnel analysis, we will be looking at three main event types, which account for more than 97% of the events:
- dashboard_open: user opens the translation dashboard
- dashboard_translation_start: proceeding from the dashboard to the start screen
- editor_segment_add: user adds a segment of content to the translated version in the editor
While there are several other events instrumented (mostly related to how users interact with the suggestions), they account for less than 3% of the events, including them in the main analysis, adds a lot of noise, making it hard to derive insights. However, there will be a section at the end of to understand interactions with those events.

Code

# main events list
main_events = ['dashboard_open', 'dashboard_translation_start', 'editor_segment_add']

# function to plot funnel of user flows
# by default return a Plotly Sankey plot for a given a dataframe
#     https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Sankey.html
# optional: add a table with distribution of edit buckets that tiggered the events
# optional: return dataframe with transition data, instead of the plots
def plot_funnel(df, 
                return_transition_data=False,
                chart_title=None,
                events_scope=main_events, 
                incl_session_end=True,
                incl_edit_bucket_table=False,
                font_size=12, 
                width=iplot_width,
                height=iplot_width/2.25):
    
    warnings.filterwarnings('ignore')
        
    df = df.query("""event_type == @events_scope""")    
    df = df.sort_values(by=['content_translation_session_id', 'content_translation_session_position'])
    
    # next event in order within a session
    df['next_event_type'] = df.groupby('content_translation_session_id')['event_type'].shift(-1)

    # consider as session ended if there no next event
    if incl_session_end:
        df['next_event_type'].fillna('session end', inplace=True)
    else:
        df.dropna(subset=['next_event_type'], inplace=True)
    
    transition_counts = df.groupby(['event_type', 'next_event_type']).size().reset_index(name='count')
    total_transitions_by_source = transition_counts.groupby('event_type')['count'].sum()
    transition_counts['total_by_source'] = transition_counts['event_type'].map(total_transitions_by_source)
    transition_counts['percentage'] = (transition_counts['count'] / transition_counts['total_by_source']) * 100
    
    # subplots of table addition, if needed
    if incl_edit_bucket_table:
        fig = sp.make_subplots(rows=1, cols=2, column_widths=[0.7, 0.3], 
                               specs=[[{"type": "sankey"}, {"type": "table"}]])
    else:
        fig = sp.make_subplots(rows=1, cols=1, 
                               specs=[[{"type": "sankey"}]])

    
    if return_transition_data:
        return transition_counts
    else:
        event_types = pd.concat([transition_counts['event_type'], transition_counts['next_event_type']]).unique()
        all_event_types = pd.concat([transition_counts['event_type'], transition_counts['next_event_type']]).unique()
        label_mapping = {label: i for i, label in enumerate(all_event_types)}

        sources = transition_counts['event_type'].map(label_mapping)
        targets = transition_counts['next_event_type'].map(label_mapping)
        weights = transition_counts['count']

        sankey = go.Sankey(
            node=dict(
                pad=15,
                thickness=20,
                line=dict(color="black", width=0.5),
                label=[label if label != 'session end' else '<i>session end</i>' for label in all_event_types]
            ),
            link=dict(
                source=sources,
                target=targets,
                value=weights,
                hovertemplate='Events: %{value}<br />' +
                              'Percentage: %{customdata:.2f}%<extra></extra>',
                customdata=transition_counts['percentage']
            )
        )
        
        fig.add_trace(sankey, row=1, col=1)

        if incl_edit_bucket_table:
            agg_events_by_bucket = (
                df
                .user_global_edit_count_bucket
                .value_counts()
                .reset_index()
                .rename({
                    'user_global_edit_count_bucket': 'Edit Bucket',
                    'count': '# Events'
                }, axis=1)
                .sort_values('Edit Bucket')
            )
            
            agg_events_by_bucket['% of Events'] = (
                agg_events_by_bucket['# Events'] / agg_events_by_bucket['# Events'].sum()
            ).apply(lambda x:f"{x:.0%}")   
            
            table = go.Table(
                columnwidth = [4, 3, 4],
                header=dict(values=list(agg_events_by_bucket.columns),
                            align='left'),
                cells=dict(values=[
                    agg_events_by_bucket['Edit Bucket'], 
                    agg_events_by_bucket['# Events'], 
                    agg_events_by_bucket['% of Events']],
                           align='left', 
                           height=25)
            )
            
            fig.add_trace(table, row=1, col=2)
        
        fig.update_layout(title_text=chart_title, font_size=font_size, height=height, width=width)
        return fig

Code

iplot(
    plot_funnel(
        events, 
        chart_title='Flow of Users Through CX Workflows & Number of Events Generated by Edit Bucket', 
        incl_edit_bucket_table=True, 
        width=max_width, 
        height=max_width/2.25), 
    config=iplot_config
)

Summary

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

In most cases (77%), those who opened the dashboard transitioned to the translation start screen.
- This is because for users navigating to the dashboard from an external entry point, both events occur consecutively.
- 13% ended the session and 8% refreshed the dashboard or came back to it later before the session expired.
Among the users who proceeded to the start screen, only in 15% of the cases they progressed to the editor and made an edit.
- In 46% of the cases, users went back to the main dashboard, and 30% ended the session.
- As most of the events were generated by users with 0 edits (newcomers), this is largely influenced by those events.
Among users who made at least one edit, in 80% of the cases, they continued to make additional edits, while 9% went back to the main dashboard, and the rest ended the session.

By Edit Bucket

Code

n_events = events.query("""(user_global_edit_count_bucket == '0 edits') & (event_type == @main_events)""").shape[0]
iplot(
    plot_funnel(events.query("""user_global_edit_count_bucket == '0 edits'"""), 
                chart_title=f'Flow of Users Through CX Workflows Having 0 Global Edits ({n_events} events)'), 
          config=iplot_config)

Summary: Users with 0 Global Edits

Among the users who opened the dashboard:

in 80% of the cases, they proceeded to translation start screen.
in 12% of the cases, they ended the session.
in 8% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 12% of the cases, they transitioned to the editor and made an edit.
in 42% of the cases, they went back to the main dashboard.
in 35% of the cases, they ended the session.

Among users who made at least one edit:

in 69% of the cases, they continued to make additional edits.
in 11% of the cases, they went back to the main dashboard.
in 11% of the cases, they ended the session.

Code

n_events = events.query("""(user_global_edit_count_bucket == '1-4 edits') & (event_type == @main_events)""").shape[0]
iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '1-4 edits'"""), 
                      chart_title=f'Flow of Users Through CX Workflows Having 1-4 Global Edits ({n_events} events)'), 
          config=iplot_config)

Summary: Users with 1-4 Global Edits

Among the users who opened the dashboard:

in 82% of the cases, they proceeded to translation start screen.
in 12% of the cases, they ended the session.
in 5% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 8% of the cases, they transitioned to the editor and made an edit.
in 60% of the cases, they went back to the main dashboard.
in 27% of the cases, they ended the session.

Among users who made at least one edit:

in 75% of the cases, they continued to make additional edits.
in 9% of the cases, they went back to the main dashboard.
in 8% of the cases, they ended the session.

Code

n_events = events.query("""(user_global_edit_count_bucket == '5-99 edits') & (event_type == @main_events)""").shape[0]
iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '5-99 edits'"""), 
                      chart_title=f'Flow of Users Through CX Workflows Having 5-99 Global Edits ({n_events} events)'), 
          config=iplot_config)

Summary: Users with 5-99 Global Edits

Among the users who opened the dashboard:

in 77% of the cases, they proceeded to translation start screen.
in 14% of the cases, they ended the session.
in 9% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 13% of the cases, they transitioned to the editor and made an edit.
in 64% of the cases, they went back to the main dashboard.
in 28% of the cases, they ended the session.

Among users who made at least one edit:

in 80% of the cases, they continued to make additional edits.
in 10% of the cases, they went back to the main dashboard.
in 6% of the cases, they ended the session.

Code

n_events = events.query("""(user_global_edit_count_bucket == '100-999 edits') & (event_type == @main_events)""").shape[0]
iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '100-999 edits'"""), 
                      chart_title=f'Flow of Users Through CX Workflows Having 100-999 Global Edits ({n_events} events)'), 
          config=iplot_config)

Summary: Users with 100-999 Global Edits

Among the users who opened the dashboard:

in 71% of the cases, they proceeded to translation start screen.
in 18% of the cases, they ended the session.
in 11% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 16% of the cases, they transitioned to the editor and made an edit.
in 49% of the cases, they went back to the main dashboard.
in 30% of the cases, they ended the session.

Among users who made at least one edit:

in 85% of the cases, they continued to make additional edits.
in 7% of the cases, they went back to the main dashboard.
in 5% of the cases, they ended the session.

Code

n_events = events.query("""(user_global_edit_count_bucket == '1000+ edits') & (event_type == @main_events)""").shape[0]
iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '1000+ edits'"""), 
                      chart_title=f'Flow of Users Through CX Workflows Having 1000+ Global Edits ({n_events} events)'), 
          config=iplot_config)

Summary: Users with 1000+ Global Edits

Among the users who opened the dashboard:

in 72% of the cases, they proceeded to translation start screen.
in 17% of the cases, they ended the session.
in 10% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 32% of the cases, they transitioned to the editor and made an edit.
in 40% of the cases, they went back to the main dashboard.
in 23% of the cases, they ended the session.

Among users who made at least one edit:

in 86% of the cases, they continued to make additional edits.
in 8% of the cases, they went back to the main dashboard.
in 6% of the cases, they ended the session.

All Edit Buckets

Code

# consolidated view of users flows by edit bucket

transition_by_bucket = pd.concat([
    plot_funnel(events.query(f"user_global_edit_count_bucket == '{bucket}'"), return_transition_data=True)
    .query("percentage > 5")
    .assign(edit_bucket=bucket)
    for bucket in edit_buckets
])

steps = ['dashboard_open', 'dashboard_translation_start', 'editor_segment_add']
transition_by_bucket = transition_by_bucket.assign(
    event_type=pd.Categorical(transition_by_bucket['event_type'], categories=steps, ordered=True),
    next_event_type=pd.Categorical(transition_by_bucket['next_event_type'], categories=steps + ['session end'], ordered=True)
)

labels_map = {
    'dashboard_open': 'main dashboard',
    'dashboard_translation_start': 'translation start screen',
    'editor_segment_add': 'made an edit',
    'session end': 'session ended'
}

transition_by_bucket = transition_by_bucket.assign(
    event_type=transition_by_bucket['event_type'].replace(labels_map),
    next_event_type=transition_by_bucket['next_event_type'].replace({k: f'➔ {v}' for k, v in labels_map.items()})
)

transition_by_bucket = (
    transition_by_bucket
    .sort_values(['event_type', 'next_event_type'])
    .pivot_table(
        index=['event_type', 'next_event_type'], 
        columns='edit_bucket', 
        values='percentage',
        sort=False
    )
    .reindex(edit_buckets, axis='columns')
    .reset_index()
)

Code

transition_by_bucket_tbl = (
    gt
    .GT(
        transition_by_bucket,
        groupname_col='event_type', 
        rowname_col='next_event_type',
    )
    .tab_header('Transitions to Various Stages by Edit Bucket')
    .fmt_percent(edit_buckets, decimals=0, scale_values=False)
    .tab_style(
        style=gt.style.text(size="16px"), 
        locations=gt.loc.body(columns=edit_buckets)
    )
    .tab_style(
        style=gt.style.borders('right', '#bdbdbd'), 
        locations=gt.loc.body(columns=edit_buckets)
    )
)

transition_by_bucket_tbl

Transitions to Various Stages by Edit Bucket
	0 edits	1-4 edits	5-99 edits	100-999 edits	1000+ edits
main dashboard
➔ main dashboard	7%	6%	8%	11%	10%
➔ translation start screen	80%	82%	77%	71%	73%
➔ session ended	12%	12%	14%	18%	17%
translation start screen
➔ main dashboard	43%	59%	54%	49%	40%
➔ translation start screen	9%	7%	6%	6%
➔ made an edit	12%	8%	13%	16%	32%
➔ session ended	35%	26%	28%	30%	23%
made an edit
➔ main dashboard	11%	9%	10%	7%	8%
➔ translation start screen	9%	6%
➔ made an edit	69%	77%	80%	86%	86%
➔ session ended	12%	8%	6%	6%	6%

Summary

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

Across all edit count buckets, most of the users (>70%) who opened the dashboard proceeded to the translation start screen.
- The percentage is higher for newcomers compared to experienced users. This is because most newcomers reach the dashboard through external entry points rather than directly opening the dashboard, in which case, both dashboard_open and dashboard_translation_start are consecutively triggered (with no user action in between), whereas, among experienced users, more users open the dashboard directly and then click to proceed translation start screen.
Among users who reached the translation start screen
- Newcomers tend to end/abandon the session or return to the main dashboard
- Only in 12% of the cases, newcomers continued to make an edit from this stage, whereas users with 1000+ made an edit in 32% of the cases.
  - With higher the editing experience, the more likely that users will continue to make an edit
Among users who made at least one, with increasing editing experience, the more likely that users will continue to make additional edits to the machine-translated content, and less likely to end the session or return to the dashboard.

By Entry Point

Code

dopen_sources = events.query("""event_type == 'dashboard_open'""").event_source.unique().tolist()

# plot funnel for a given source
# identifies sessions starting with the specificed source
# uses the original plot_funnel functions
# includes edit bucket table by default 
def plot_funnel_for_source(source, incl_edit_bucket_table=True):
    
    sessions_with_source = (
        events
        .query(f"""(event_source == '{source}') & (event_type == 'dashboard_open') & (content_translation_session_position == 0)""")
        .content_translation_session_id
        .unique()
        .tolist()
    )
    
    n_events = events.query("""(event_source == @source) & (event_type == @main_events)""").shape[0]
    
    iplot(plot_funnel(events.query("""content_translation_session_id == @sessions_with_source"""), 
                      chart_title=f'Flow of Users Through CX Workflows; Source: {source} ({n_events} events) & Number of Events Generated by Edit Bucket', 
                      incl_edit_bucket_table=incl_edit_bucket_table, width=max_width, height=max_width/2.25), 
          config=iplot_config)

Code

plot_funnel_for_source('frequent_languages')

Summary: users navigated to the dashboard from frequent languages menu

Among the users who opened the dashboard:

in 82% of the cases, they proceeded to translation start screen.
in 11% of the cases, they ended the session.
in 6% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 13% of the cases, they transitioned to the editor and made an edit.
in 46% of the cases, they went back to the main dashboard.
in 32% of the cases, they ended the session.

Among users who made at least one edit:

in 79% of the cases, they continued to make additional edits.
in 8% of the cases, they went back to the main dashboard.
in 8% of the cases, they ended the session.

Code

plot_funnel_for_source('content_language_selector')

Summary: users navigated to the dashboard from content language selector

Among the users who opened the dashboard:

in 75% of the cases, they proceeded to translation start screen.
in 19% of the cases, they ended the session.
in 6% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 23% of the cases, they transitioned to the editor and made an edit.
in 45% of the cases, they went back to the main dashboard.
in 25% of the cases, they ended the session.

Among users who made at least one edit:

in 75% of the cases, they continued to make additional edits.
in 13% of the cases, they went back to the main dashboard.
in 9% of the cases, they ended the session.

Code

plot_funnel_for_source('direct')

Summary: users who opened the dashboard directly

Among the users who opened the dashboard:

in 36% of the cases, they proceeded to translation start screen.
in 35% of the cases, they ended the session.
in 27% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 36% of the cases, they transitioned to the editor and made an edit.
in 40% of the cases, they went back to the main dashboard.
in 14% of the cases, they ended the session.

Among users who made at least one edit:

in 90% of the cases, they continued to make additional edits.
in 5% of the cases, they went back to the main dashboard.
in 3% of the cases, they ended the session.

Code

plot_funnel_for_source('invite_new_article_creation')

Summary: users navigated to the dashboard from an invitation shown on a non-existent page

Among the users who opened the dashboard:

in 80% of the cases, they proceeded to translation start screen.
in 9% of the cases, they ended the session.
in 10% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 14% of the cases, they transitioned to the editor and made an edit.
in 57% of the cases, they went back to the main dashboard.
in 25% of the cases, they ended the session.

Among users who made at least one edit:

in 76% of the cases, they continued to make additional edits.
in 13% of the cases, they went back to the main dashboard.
in 9% of the cases, they ended the session.

Code

plot_funnel_for_source('direct_preselect')

Summary: users who opened the dashboard directly by specifying link to a specific translation

Among the users who opened the dashboard:

in 87% of the cases, they proceeded to translation start screen.
in 6% of the cases, they ended the session.
in 6% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 11% of the cases, they transitioned to the editor and made an edit.
in 50% of the cases, they went back to the main dashboard.
in 35% of the cases, they ended the session.

Among users who made at least one edit:

in 82% of the cases, they continued to make additional edits.
in 7% of the cases, they went back to the main dashboard.
in 11% of the cases, they ended the session.

Code

plot_funnel_for_source('recent_translation')

Summary: users navigated from a notice on recently translated articles to review/expand the translation

Among the users who opened the dashboard:

in 87% of the cases, they proceeded to translation start screen.
in 8% of the cases, they ended the session.
in 5% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 31% of the cases, they transitioned to the editor and made an edit.
in 44% of the cases, they went back to the main dashboard.
in 17% of the cases, they ended the session.

Among users who made at least one edit:

in 79% of the cases, they continued to make additional edits.
in 10% of the cases, they went back to the main dashboard.
in 10% of the cases, they ended the session.

Code

plot_funnel_for_source('contributions_page')

Summary: users navigated from Special:Contributions page

Among the users who opened the dashboard:

in 39% of the cases, they proceeded to translation start screen.
in 49% of the cases, they ended the session.
in 7% of the cases, they refereshed the dashboard or came back to it later before the session expired.

Among the users who reach the translation start screen:

in 7% of the cases, they transitioned to the editor and made an edit.
in 64% of the cases, they went back to the main dashboard.
in 14% of the cases, they ended the session.

Among users who made at least one edit:

in 50% of the cases, they continued to make additional edits.
in 20% of the cases, they went back to the main dashboard.
in 30% of the cases, they ended the session.

All Entry Points

Code

# consolidated view of transitions by entry point

transition_by_source = pd.DataFrame()

for source in dopen_sources:
    
    sessions_with_source = (
        events
        .query(f"""(event_source == '{source}') & (event_type == 'dashboard_open') & (content_translation_session_position == 0)""")
        .content_translation_session_id
        .unique()
        .tolist()
    )
    
    transition_data = (
        plot_funnel(
            events
            .query("""content_translation_session_id == @sessions_with_source"""),
            return_transition_data=True)
        .query("percentage > 5")
        .assign(source=source)
    )
    
    transition_by_source = pd.concat([transition_by_source, transition_data])
    
steps = ['dashboard_open', 'dashboard_translation_start', 'editor_segment_add']
transition_by_source = transition_by_source.assign(
    event_type=pd.Categorical(transition_by_source['event_type'], categories=steps, ordered=True),
    next_event_type=pd.Categorical(transition_by_source['next_event_type'], categories=steps + ['session end'], ordered=True)
)

transition_by_source = transition_by_source.assign(
    event_type=transition_by_source['event_type'].replace(labels_map),
    next_event_type=transition_by_source['next_event_type'].replace({k: f'➔ {v}' for k, v in labels_map.items()})
)

transition_by_source = (
    transition_by_source
    .sort_values(['event_type', 'next_event_type'])
    .pivot_table(
        index=['event_type', 'next_event_type'], 
        columns='source', 
        values='percentage',
        sort=False
    )
    .reindex(entry_points_freq.entry_point.tolist(), axis='columns')
    .reset_index()
)

Code

transition_by_source_tbl = (
    gt
    .GT(
        transition_by_source,
        groupname_col='event_type', 
        rowname_col='next_event_type',
    )
    .tab_header('Transitions to Various Stages of Translation Funnel by Source of Entry to the Dashboard')
    .fmt_percent(dopen_sources, decimals=0, scale_values=False)
    .tab_style(
        style=gt.style.text(size="16px"), 
        locations=gt.loc.body(columns=dopen_sources)
    )
    .tab_style(
        style=gt.style.borders('right', '#bdbdbd'), 
        locations=gt.loc.body(columns=dopen_sources)
    )
)

transition_by_source_tbl

Transitions to Various Stages of Translation Funnel by Source of Entry to the Dashboard
	frequent_languages	content_language_selector	direct	invite_new_article_creation	direct_preselect	contributions_page	recent_translation
main dashboard
➔ main dashboard	7%	6%	27%	10%	6%	7%	8%
➔ translation start screen	82%	75%	36%	80%	87%	39%	88%
➔ session ended	11%	19%	35%	9%	6%	49%
translation start screen
➔ main dashboard	46%	45%	40%	57%	50%	64%	45%
➔ translation start screen	8%	7%	12%			14%	7%
➔ made an edit	13%	23%	34%	14%	11%	7%	31%
➔ session ended	33%	25%	14%	25%	36%	14%	17%
made an edit
➔ main dashboard	8%	13%	5%	13%	7%	20%	10%
➔ made an edit	79%	75%	91%	76%	81%	50%	79%
➔ session ended	8%	9%		9%	11%	30%	10%

Summary

(main events: dashboard_open, dashbaord_translation_start, editor_segment_end)

The rate of transition between various stages of the funnel by the source of entry is highly correlated to the usage of the respective entry point by various user experience levels.

Among users who navigated through frequent languages menu, which was most frequently accessed by newcomers:
- In 82% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 13% of the cases.
Among users who navigated through content language sector, which was frequently accessed by both newcomers and experienced users alike:
- In 75% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 23% of the cases.
Among users who directly opened the dashboard, most frequently by experienced users:
- In 36% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 34% of the cases.
Among users who navigated from an invitation shown on a non-existent page, which was frequently accessed by both newcomers and experienced users alike:
- In 80% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 14% of the cases.
Among users who directly opened the dashboard with link to specific translation, most frequently by experienced users:
- In 87% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 11% of the cases.
Among users who navigated from contributions page, which was frequently accessed by both newcomers and experienced users alike:
- In 39% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 7% of the cases.
Among users who navigated from notice on recently translated articles to review/expand the translation, most frequently by experienced users:
- In 88% of the cases, they proceeded to the translation start screen.
- From the translation start screen, users made at least one edit in 31% of the cases.

User Flows: Other Events

Code

# users flows and interactions with events apart from the main events (open, start, edit)
other_event_transitions = (
    plot_funnel(
        events, 
        events_scope=events.event_type.unique().tolist(), 
        return_transition_data=True)
    .query("""event_type != @main_events""")
    .sort_values(['event_type', 'percentage'], ascending=[True, False])
    .drop('total_by_source', axis=1)
)


other_event_transitions['next_event_type'] = (other_event_transitions['next_event_type']
                                              .replace({i:f'➔ {i}' for i in events.event_type.unique().tolist()+['session end']}))
other_event_transitions['event_type'] = (other_event_transitions['event_type']
                                              .replace({i:f"""{i} ({events[events['event_type'] == i].shape[0]} events)""" for i in events.event_type.unique().tolist()}))

other_event_transitions_tbl = (
    gt
    .GT(
        other_event_transitions,
        rowname_col='next_event_type',
        groupname_col='event_type'
    )
    .fmt_percent('percentage', scale_values=False, decimals=1)
    .cols_label(
        count='# Events',
        percentage='Percentage'
    )
    .tab_header('Transitions Between Other Event Types', 'apart from dashboard_open, dashboard_traslation_start, editor_segment_add')
)

other_event_transitions_tbl

Transitions Between Other Event Types
apart from dashboard_open, dashboard_traslation_start, editor_segment_add
	# Events	Percentage
dashboard_discard_suggestion (677 events)
➔ dashboard_discard_suggestion	540	79.8%
➔ dashboard_translation_start	70	10.3%
➔ session end	34	5.0%
➔ dashboard_open	16	2.4%
➔ dashboard_search	6	0.9%
➔ dashboard_translation_continue	6	0.9%
➔ dashboard_refresh_suggestions	3	0.4%
➔ dashboard_translation_discard	1	0.1%
➔ editor_segment_add	1	0.1%
dashboard_refresh_suggestions (110 events)
➔ dashboard_refresh_suggestions	37	33.6%
➔ dashboard_translation_start	23	20.9%
➔ session end	18	16.4%
➔ dashboard_open	17	15.5%
➔ dashboard_search	7	6.4%
➔ dashboard_discard_suggestion	4	3.6%
➔ dashboard_translation_continue	4	3.6%
dashboard_search (958 events)
➔ dashboard_translation_start	791	82.6%
➔ dashboard_open	116	12.1%
➔ session end	33	3.4%
➔ dashboard_translation_continue	14	1.5%
➔ dashboard_search	3	0.3%
➔ editor_segment_add	1	0.1%
dashboard_translation_continue (440 events)
➔ dashboard_open	294	66.8%
➔ editor_segment_add	70	15.9%
➔ session end	64	14.5%
➔ dashboard_translation_continue	5	1.1%
➔ dashboard_translation_discard	3	0.7%
➔ dashboard_translation_start	3	0.7%
➔ dashboard_search	1	0.2%
dashboard_translation_discard (132 events)
➔ dashboard_translation_discard	76	57.6%
➔ dashboard_search	18	13.6%
➔ session end	14	10.6%
➔ dashboard_open	12	9.1%
➔ dashboard_translation_continue	8	6.1%
➔ dashboard_translation_start	3	2.3%
➔ dashboard_discard_suggestion	1	0.8%

Summary

In cases where users discarded a suggested translation (677 occurrences), in 80% of the cases they continued to discard the next translation show as well, and 10% proceeded to the translation start screen.
In cases where users requested that the list of suggestions be regenerated (110 occurrences), in 33% of the cases they refreshed the suggestions again, and 20% proceeded to the translation start screen.
In cases where users initiated a search (958 occurrences), in 82% of the cases they proceeded to the translation start screen, and 12% returned to the dashboard.
In cases where users selected an in-progress translation (440 occurrences), in 67% of the cases they returned to the dashboard, and 16% made an edit to the translation.
In cases where users discarded an in-progress translation (132 occurrences), in 58% of the cases they discarded additional in-progress translations, and 13% initiated a search.