Section translation is an expansion of the Content Translation capabilities. Section translation enables users to expand existing Wikipedia articles by translating new sections. In addition, Section Translation is designed to work on mobile devices (in addition to desktop), which enables users to translate that was not possible with Content Translation before.
The content translation events capture various aspects of user interactions with the content and section translation tools. This analysis is the first iteration of visualizing how users arrive through various entry points, flows, and how many reach the currently instrumented next stages.
90 days of data preceding 2023-12-31 was reviewed.
Overall Summary
Frequency of Entry Points Usage (by Edit Count Bucket of Users)
85% of the newcomers opened the translation dashboard by navigating from frequent language selector, which surfaces missing languages to translate for an article.
As users gain more editing experience, more users tend to reach the dashboard increasingly through content language selector, which they can search for missing language to translate for an article.
Also, experienced users tend to open the dashboard directly as compared to newcomers.
For users with 1000+ edits, frequent languages selector is only 40% of the time to navigate to the dashboard.
Frequency of Entry Points Usage (by comparative size of target language Wikipedia)
On larger Wikipedias, frequent languages selector was most used to navigate to the translation dashboard.
Among the top 20 Wikipedias, it was used 85% of the time to access the dashboard.
On smaller Wikipedias, although frequent language selector remains the most accessed, usage of content language selector is more compared to larger Wikipedia.
This is related to the observations from the user edit bucket, larger Wikipedias tend to have more newcomers compared to smaller Wikipedias.
Translation Start Screen
Only 7% of the dashboard_translation_start occurred independently of dashboard_open events.
That indicates that most of the users start the translations by already selecting an article/section to translate from an external entry point.
Among the ones who initiate dashboard_translation_start independently
Majority of the newcomers start a translation by accepting suggestions by the API in the absence of a seed article.
Majority of the experienced users start a translation by choosing the results of a search, followed by accepting a translation suggested because it is related to one of their recent edits.
In most cases (77%), those who opened the dashboard transitioned to the translation start screen.
This is because for users navigating to the dashboard from an external entry point, both events occur consecutively.
13% ended the session and 8% refreshed the dashboard or came back to it later before the session expired.
Among the users who proceeded to the start screen, only in 15% of the cases they progressed to the editor and made an edit.
In 46% of the cases, users went back to the main dashboard, and 30% ended the session.
As most of the events were generated by users with 0 edits (newcomers), this is largely influenced by those events.
Among users who made at least one edit, in 80% of the cases, they continued to make additional edits, while 9% went back to the main dashboard, and the rest ended the session.
Across all edit count buckets, most of the users (>70%) who opened the dashboard proceeded to the translation start screen.
The percentage is higher for newcomers compared to experienced users. This is because most newcomers reach the dashboard through external entry points rather than directly opening the dashboard, in which case, both dashboard_open and dashboard_translation_start are consecutively triggered (with no user action in between), whereas, among experienced users, more users open the dashboard directly and then click to proceed translation start screen.
Among users who reached the translation start screen
Newcomers tend to end/abandon the session or return to the main dashboard
Only in 12% of the cases, newcomers continued to make an edit from this stage, whereas users with 1000+ made an edit in 32% of the cases.
With higher the editing experience, the more likely that users will continue to make an edit
Among users who made at least one, with increasing editing experience, the more likely that users will continue to make additional edits to the machine-translated content, and less likely to end the session or return to the dashboard.
The rate of transition between various stages of the funnel by the source of entry is highly correlated to the usage of the respective entry point by various user experience levels.
Among users who navigated through frequent languages menu, which was most frequently accessed by newcomers:
In 82% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 13% of the cases.
Among users who navigated through content language sector, which was frequently accessed by both newcomers and experienced users alike:
In 75% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 23% of the cases.
Among users who directly opened the dashboard, most frequently by experienced users:
In 36% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 34% of the cases.
Among users who navigated from an invitation shown on a non-existent page, which was frequently accessed by both newcomers and experienced users alike:
In 80% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 14% of the cases.
Among users who directly opened the dashboard with link to specific translation, most frequently by experienced users:
In 87% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 11% of the cases.
Among users who navigated from contributions page, which was frequently accessed by both newcomers and experienced users alike:
In 39% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 7% of the cases.
Among users who navigated from notice on recently translated articles to review/expand the translation, most frequently by experienced users:
In 88% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 31% of the cases.
Flow of Users (other events)
In cases where users discarded a suggested translation (677 occurrences), in 80% of the cases they continued to discard the next translation show as well, and 10% proceeded to the translation start screen.
In cases where users requested that the list of suggestions be regenerated (110 occurrences), in 33% of the cases they refreshed the suggestions again, and 20% proceeded to the translation start screen.
In cases where users initiated a search (958 occurrences), in 82% of the cases they proceeded to the translation start screen, and 12% returned to the dashboard.
In cases where users selected an in-progress translation (440 occurrences), in 67% of the cases they returned to the dashboard, and 16% made an edit to the translation.
In cases where users discarded an in-progress translation (132 occurrences), in 58% of the cases they discarded additional in-progress translations, and 13% initiated a search.
Data Gathering
Setup
Code
import wmfdata as wmfimport pandas as pdfrom datetime import datetime, timedeltaimport great_tables as gtimport plotly.express as pximport plotly.graph_objects as goimport plotly.subplots as spfrom plotly.offline import download_plotlyjs, init_notebook_mode, iplotfrom IPython.display import display_html, display, HTML, clear_output, Markdownimport warnings
Code
init_notebook_mode(connected=True)pd.options.display.max_columns =Nonepd.options.display.max_rows =250# width for chartsiplot_width =950max_width =1250# always show options bariplot_config = {'displayModeBar': True}# prints a string at center of the output, bold if neededdef pr_centered(content, bold=False):if bold: content =f"<b>{content}</b>" centered_html =f"<div style='text-align:center'>{content}</div>" display(HTML(centered_html))
%%timequery ="""SELECT dt AS ts, DATE(dt) AS dt, HOUR(dt) AS hour, wiki_db, access_method, content_translation_session_id, content_translation_session_position, event_type, event_source, translation_type, translation_source_language, translation_target_language, user_is_anonymous, user_global_edit_count_bucket, year, day, monthFROM event_sanitized.mediawiki_content_translation_eventWHERE DATE(dt) >= DATE('{START_DT}') AND DATE(dt) <= DATE('{END_DT}')"""all_events = wmf.spark.run( query.format( START_DT=start_dt, END_DT=end_dt ))
CPU times: user 4.41 s, sys: 723 ms, total: 5.14 s
Wall time: 2min 8s
Code
edit_buckets_across_all_sessions = ( all_events[['content_translation_session_id', 'user_global_edit_count_bucket']] .user_global_edit_count_bucket .value_counts(normalize=True) .reset_index() .rename({'user_global_edit_count_bucket': 'Edit Bucket','proportion': 'Percentage of events' }, axis=1) .sort_values('Percentage of events', ascending=False, ignore_index=True))edit_buckets_across_all_sessions['Percentage of events'] = edit_buckets_across_all_sessions['Percentage of events'].apply(lambda x:f"{x:.2%}")pr_centered(f'Distribution of edit buckets across all sessions', True)edit_buckets_across_all_sessions
Distribution of edit buckets across all sessions
Edit Bucket
Percentage of events
0
0 edits
53.93%
1
1000+ edits
19.20%
2
5-99 edits
10.76%
3
100-999 edits
9.87%
4
1-4 edits
6.24%
Data Cleaning
During analysis, several issues related to the events produced were identified. The most significant issue was with content_translation_session_position where multiple events belong to different and same event types although occurred at different times, have the same session position. Currently, we are not sure whether the session position was being recorded incorrectly, in which it can be re-constructed based on the timestamp, or if they are duplicate events. More information and task to investigate these issues are at T353882. For this analysis, all sessions with potentially erroneous events will not be considered.
Code
temporal_columns = ['ts', 'dt', 'hour']# sessions with duplicate events expect for the temporal columnssessions_with_duplicate_events = ( all_events[[col for col in all_events.columns.tolist() if col notin temporal_columns]] .value_counts() .reset_index() .rename({0: 'count'}, axis=1) .query("""count > 1""") .content_translation_session_id .unique() .tolist())# various event types in a session having same session position althoguh the events occured latersession_event_counts = ( all_events.groupby(['content_translation_session_id', 'content_translation_session_position']) .agg(distinct_events=('event_type', pd.Series.nunique)))sessions_with_same_position_events = ( session_event_counts.query("""distinct_events > 1""") .reset_index() .content_translation_session_id .unique() .tolist())# sessions where multiple global edit count buckets were recordedsessions_with_multiple_edit_counts = ( all_events.groupby('content_translation_session_id')['user_global_edit_count_bucket'] .nunique() .reset_index() .query("""user_global_edit_count_bucket > 1""") .content_translation_session_id .unique() .tolist())# sessions with no dashboard open at startsessions_with_no_dopen_start = ( all_events.query("""(content_translation_session_position == 0) & (event_type != 'dashboard_open')""") .content_translation_session_id .unique() .tolist())sessions_with_dopen = ( all_events .query("""event_type == 'dashboard_open'""")['content_translation_session_id'] .unique() .tolist())# sessions without dashboard opensessions_without_dopen = ( all_events .query("""content_translation_session_id != @sessions_with_dopen""")['content_translation_session_id'] .unique() .tolist())# sessions with multiple events having same session positionduplicate_events_with_same_position = ( all_events[['content_translation_session_id', 'content_translation_session_position', 'event_type']] .value_counts() .reset_index() .rename({0: 'count'}, axis=1) .query("""count > 1""") .content_translation_session_id .unique() .tolist())
n_all_sessions = all_events.content_translation_session_id.nunique()n_all_events = all_events.shape[0]n_valid_sessions = events.content_translation_session_id.nunique()n_events_from_valid_sessions = events.shape[0]pct_invalid_sessions =100-round(n_valid_sessions / n_all_sessions *100, 2) print(f'- all sessions: {n_all_sessions}; all events: {n_all_events}')print(f'- valid sessions: {n_valid_sessions}; events from valid sessions: {n_events_from_valid_sessions}')print(f'- percentage of sessions with potentially erroneous events: {pct_invalid_sessions}%')
- all sessions: 29365; all events: 277212
- valid sessions: 15143; events from valid sessions: 87874
- percentage of sessions with potentially erroneous events: 48.43%
Analysis: Entry Points & Sources
Dashboard Open
As the goal is to understand how users reach the translation dashboard, this part of the analysis only includes events where users navigate to main dashboard from an external source. For example, after adding a segement, a user can come back to dashboard for another translation, and these events are currently being recorded as direct acess (T353799), such dashboard_open events are not considered for this part.
across 15007 sessions from 2023-10-02 to 2023-12-31
Code
# frequency of entry points usageentry_points_freq = ( dopen_events.event_source .value_counts(normalize=True) .reset_index() .rename({'event_source': 'entry_point','proportion': 'percent' }, axis=1))# generate table from dataframeentry_points_table = ( gt .GT(entry_points_freq) .fmt_percent(columns='percent') .cols_label( entry_point='Entry Point', percent='Percentage' ) .tab_header('Overall Distribution Of Entry Points That Users Navigate From', 'To Reach the Content Translation dashboard') .tab_source_note(f'across {dopen_events.content_translation_session_id.nunique()} sessions from {start_dt} to {end_dt}'))entry_points_table
Overall Distribution Of Entry Points That Users Navigate From
To Reach the Content Translation dashboard
Entry Point
Percentage
frequent_languages
72.69%
content_language_selector
15.94%
direct
6.37%
invite_new_article_creation
3.22%
direct_preselect
1.33%
contributions_page
0.25%
recent_translation
0.21%
across 15007 sessions from 2023-10-02 to 2023-12-31
by Edit Bucket
Code
warnings.filterwarnings('ignore')# usage of entry points by various edit bucketsentry_by_edit_bucket = ( dopen_events .groupby(['user_global_edit_count_bucket', 'event_source']) .size() .reset_index() .rename({'user_global_edit_count_bucket': 'edit_bucket','event_source': 'source',0: 'count' }, axis=1) .sort_values(['edit_bucket', 'count'], ascending=[True, False]) .reset_index(drop=True))# total by each edit bucketentry_by_edit_bucket['total'] = ( entry_by_edit_bucket['edit_bucket'] .map(entry_by_edit_bucket .groupby('edit_bucket') .agg({'count': sum}) .to_dict()['count'] ))# percantage of usage by edit bucketentry_by_edit_bucket = entry_by_edit_bucket.astype({'total': int})entry_by_edit_bucket['percent'] = entry_by_edit_bucket['count'] / entry_by_edit_bucket['total']
Code
# only display annonations if entry point accounts for more than 5%entry_by_edit_bucket['percent_annot'] = ( entry_by_edit_bucket['percent'] .apply(lambda x:f"{x:.0%}"if x >0.05elseNone))# bar graphfig = px.bar(entry_by_edit_bucket, x='percent', y='edit_bucket', color='source', labels={'percent':'% of Total Events', 'edit_bucket': 'Edit Bucket', 'source': 'Entry Points' }, color_discrete_sequence=px.colors.qualitative.T10, title='Usage of Entry Points by User Global Edit Bucket', text='percent_annot', # display in increasing edit bucket order category_orders={'edit_bucket': edit_buckets, 'source': entry_points_freq.entry_point.values.tolist() } )# relative stacks the barsfig.update_layout(barmode='relative', height=550, width=max_width)fig.update_xaxes(tickformat='.0%')fig = fig.update_traces( textfont_color='white', hovertemplate="<br>".join(["Edit Bucket: %{y}","Percent of Total Events: %{x:.0%}" ]))iplot(fig, config=iplot_config)
Summary
85% of the newcomers opened the translation dashboard by navigating from frequent language selector, which surfaces missing languages to translate for an article.
As users gain more editing experience, more users tend to reach the dashboard increasingly through content language selector, which they can search for missing language to translate for an article.
Also, experienced users tend to open the dashboard directly as compared to newcomers.
For users with 1000+ edits, frequent languages selector is only 40% of the time to navigate to the dashboard.
# usage of entry points by translation target language wiki sizeentry_by_target_wp_size = ( dopen_events .groupby(['target_wp_rank', 'event_source']) .size() .reset_index() .rename({'event_source': 'source',0: 'count' }, axis=1) .sort_values(['target_wp_rank', 'count'], ascending=[True, False]) .reset_index(drop=True))# total by each rankentry_by_target_wp_size['total'] = ( entry_by_target_wp_size['target_wp_rank'] .map(entry_by_target_wp_size .groupby('target_wp_rank') .agg({'count': sum}) .to_dict()['count'] ))entry_by_target_wp_size = entry_by_target_wp_size.astype({'total': int})entry_by_target_wp_size['percent'] = entry_by_target_wp_size['count'] / entry_by_target_wp_size['total']
Code
# only display annonations if entry point accounts for more than 5%entry_by_target_wp_size['percent_annot'] = ( entry_by_target_wp_size['percent'] .apply(lambda x:f"{x:.0%}"if x >0.05elseNone))# bar graphfig = px.bar(entry_by_target_wp_size.query("""target_wp_rank != '1-5'"""), x='percent', y='target_wp_rank', color='source', labels={'percent':'% of Total Events', 'target_wp_rank': 'Target Language WP Size', 'source': 'Entry Points' }, color_discrete_sequence=px.colors.qualitative.T10, title='Usage of Entry Points by Comparitive Wikipedia Size (of the Target Language)', text='percent_annot', category_orders={'target_wp_rank': [i for i in rank_bin_labels if i !='1-5'], 'source': entry_points_freq.entry_point.values.tolist() } )# stack the barsfig.update_layout(barmode='relative', height=550, width=max_width)fig.update_xaxes(tickformat='.0%')fig.update_traces(textfont_color='white')iplot(fig, config=iplot_config)
Summary
On larger Wikipedias, frequent languages selector was most used to navigate to the translation dashboard.
Among the top 20 Wikipedias, it was used 85% of the time to access the dashboard.
On smaller Wikipedias, although frequent language selector remains the most accessed, usage of content language selector is more compared to larger Wikipedia.
This is related to the observations from the user edit bucket, larger Wikipedias tend to have more newcomers compared to smaller Wikipedias.
Translation Start
The next step after opening the translation dashboard is the translation start page, which appears after a user confirms their choice of article/section to translate. This step occurs before the translation editing screen. In this section, various sources through which users reach the translation start page have been analyzed. This step can take place in two scenarios:
When users from an external source navigate to the dashboard (i.e. entry points such as frequent languages and content language selector), the opening of the translation dashboard is immediately followed by the translation start screen. In such cases, the event_source for dashboard_translation_start will be the same as the source for dashboard_open. For example, if a user clicks on a link from the frequent languages selector, dashboard_open and dashboard_translation_start events are consecutively triggered, with both having event source as frequent_languages. This is because the selection of the article/section has already happened.
When users reach the main dashboard either by directly opening, or returning after editing/completing a translation, there are multiple ways users are shown suggestions, and upon selection, sources specific to dashboard_translation_start get logged.
For this section, only events generated from the second scenario are considered, as the first scenario is caused due to the sources of dashboard_open.
Only 7% of the dashboard_translation_start occurred independently of dashboard_open events.
That indicates that most of the users start the translations by already selecting an article/section to translate from an external entry point.
Among the ones who initiate dashboard_translation_start independently
Majority of the newcomers start a translation by accepting suggestions by the API in the absence of a seed article.
Majority of the experienced users start a translation by choosing the results of a search, followed by accepting a translation suggested because it is related to one of their recent edits.
Analysis: User Flows (Funnel)
For the majority of the funnel analysis, we will be looking at three main event types, which account for more than 97% of the events:
dashboard_open: user opens the translation dashboard
dashboard_translation_start: proceeding from the dashboard to the start screen
editor_segment_add: user adds a segment of content to the translated version in the editor
While there are several other events instrumented (mostly related to how users interact with the suggestions), they account for less than 3% of the events, including them in the main analysis, adds a lot of noise, making it hard to derive insights. However, there will be a section at the end of to understand interactions with those events.
Code
# main events listmain_events = ['dashboard_open', 'dashboard_translation_start', 'editor_segment_add']# function to plot funnel of user flows# by default return a Plotly Sankey plot for a given a dataframe# https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Sankey.html# optional: add a table with distribution of edit buckets that tiggered the events# optional: return dataframe with transition data, instead of the plotsdef plot_funnel(df, return_transition_data=False, chart_title=None, events_scope=main_events, incl_session_end=True, incl_edit_bucket_table=False, font_size=12, width=iplot_width, height=iplot_width/2.25): warnings.filterwarnings('ignore') df = df.query("""event_type == @events_scope""") df = df.sort_values(by=['content_translation_session_id', 'content_translation_session_position'])# next event in order within a session df['next_event_type'] = df.groupby('content_translation_session_id')['event_type'].shift(-1)# consider as session ended if there no next eventif incl_session_end: df['next_event_type'].fillna('session end', inplace=True)else: df.dropna(subset=['next_event_type'], inplace=True) transition_counts = df.groupby(['event_type', 'next_event_type']).size().reset_index(name='count') total_transitions_by_source = transition_counts.groupby('event_type')['count'].sum() transition_counts['total_by_source'] = transition_counts['event_type'].map(total_transitions_by_source) transition_counts['percentage'] = (transition_counts['count'] / transition_counts['total_by_source']) *100# subplots of table addition, if neededif incl_edit_bucket_table: fig = sp.make_subplots(rows=1, cols=2, column_widths=[0.7, 0.3], specs=[[{"type": "sankey"}, {"type": "table"}]])else: fig = sp.make_subplots(rows=1, cols=1, specs=[[{"type": "sankey"}]])if return_transition_data:return transition_countselse: event_types = pd.concat([transition_counts['event_type'], transition_counts['next_event_type']]).unique() all_event_types = pd.concat([transition_counts['event_type'], transition_counts['next_event_type']]).unique() label_mapping = {label: i for i, label inenumerate(all_event_types)} sources = transition_counts['event_type'].map(label_mapping) targets = transition_counts['next_event_type'].map(label_mapping) weights = transition_counts['count'] sankey = go.Sankey( node=dict( pad=15, thickness=20, line=dict(color="black", width=0.5), label=[label if label !='session end'else'<i>session end</i>'for label in all_event_types] ), link=dict( source=sources, target=targets, value=weights, hovertemplate='Events: %{value}<br />'+'Percentage: %{customdata:.2f}%<extra></extra>', customdata=transition_counts['percentage'] ) ) fig.add_trace(sankey, row=1, col=1)if incl_edit_bucket_table: agg_events_by_bucket = ( df .user_global_edit_count_bucket .value_counts() .reset_index() .rename({'user_global_edit_count_bucket': 'Edit Bucket','count': '# Events' }, axis=1) .sort_values('Edit Bucket') ) agg_events_by_bucket['% of Events'] = ( agg_events_by_bucket['# Events'] / agg_events_by_bucket['# Events'].sum() ).apply(lambda x:f"{x:.0%}") table = go.Table( columnwidth = [4, 3, 4], header=dict(values=list(agg_events_by_bucket.columns), align='left'), cells=dict(values=[ agg_events_by_bucket['Edit Bucket'], agg_events_by_bucket['# Events'], agg_events_by_bucket['% of Events']], align='left', height=25) ) fig.add_trace(table, row=1, col=2) fig.update_layout(title_text=chart_title, font_size=font_size, height=height, width=width)return fig
Code
iplot( plot_funnel( events, chart_title='Flow of Users Through CX Workflows & Number of Events Generated by Edit Bucket', incl_edit_bucket_table=True, width=max_width, height=max_width/2.25), config=iplot_config)
In most cases (77%), those who opened the dashboard transitioned to the translation start screen.
This is because for users navigating to the dashboard from an external entry point, both events occur consecutively.
13% ended the session and 8% refreshed the dashboard or came back to it later before the session expired.
Among the users who proceeded to the start screen, only in 15% of the cases they progressed to the editor and made an edit.
In 46% of the cases, users went back to the main dashboard, and 30% ended the session.
As most of the events were generated by users with 0 edits (newcomers), this is largely influenced by those events.
Among users who made at least one edit, in 80% of the cases, they continued to make additional edits, while 9% went back to the main dashboard, and the rest ended the session.
By Edit Bucket
Code
n_events = events.query("""(user_global_edit_count_bucket == '0 edits') & (event_type == @main_events)""").shape[0]iplot( plot_funnel(events.query("""user_global_edit_count_bucket == '0 edits'"""), chart_title=f'Flow of Users Through CX Workflows Having 0 Global Edits ({n_events} events)'), config=iplot_config)
Summary: Users with 0 Global Edits
Among the users who opened the dashboard:
in 80% of the cases, they proceeded to translation start screen.
in 12% of the cases, they ended the session.
in 8% of the cases, they refereshed the dashboard or came back to it later before the session expired.
Among the users who reach the translation start screen:
in 12% of the cases, they transitioned to the editor and made an edit.
in 42% of the cases, they went back to the main dashboard.
in 35% of the cases, they ended the session.
Among users who made at least one edit:
in 69% of the cases, they continued to make additional edits.
in 11% of the cases, they went back to the main dashboard.
in 11% of the cases, they ended the session.
Code
n_events = events.query("""(user_global_edit_count_bucket == '1-4 edits') & (event_type == @main_events)""").shape[0]iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '1-4 edits'"""), chart_title=f'Flow of Users Through CX Workflows Having 1-4 Global Edits ({n_events} events)'), config=iplot_config)
Summary: Users with 1-4 Global Edits
Among the users who opened the dashboard:
in 82% of the cases, they proceeded to translation start screen.
in 12% of the cases, they ended the session.
in 5% of the cases, they refereshed the dashboard or came back to it later before the session expired.
Among the users who reach the translation start screen:
in 8% of the cases, they transitioned to the editor and made an edit.
in 60% of the cases, they went back to the main dashboard.
in 27% of the cases, they ended the session.
Among users who made at least one edit:
in 75% of the cases, they continued to make additional edits.
in 9% of the cases, they went back to the main dashboard.
in 8% of the cases, they ended the session.
Code
n_events = events.query("""(user_global_edit_count_bucket == '5-99 edits') & (event_type == @main_events)""").shape[0]iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '5-99 edits'"""), chart_title=f'Flow of Users Through CX Workflows Having 5-99 Global Edits ({n_events} events)'), config=iplot_config)
Summary: Users with 5-99 Global Edits
Among the users who opened the dashboard:
in 77% of the cases, they proceeded to translation start screen.
in 14% of the cases, they ended the session.
in 9% of the cases, they refereshed the dashboard or came back to it later before the session expired.
Among the users who reach the translation start screen:
in 13% of the cases, they transitioned to the editor and made an edit.
in 64% of the cases, they went back to the main dashboard.
in 28% of the cases, they ended the session.
Among users who made at least one edit:
in 80% of the cases, they continued to make additional edits.
in 10% of the cases, they went back to the main dashboard.
in 6% of the cases, they ended the session.
Code
n_events = events.query("""(user_global_edit_count_bucket == '100-999 edits') & (event_type == @main_events)""").shape[0]iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '100-999 edits'"""), chart_title=f'Flow of Users Through CX Workflows Having 100-999 Global Edits ({n_events} events)'), config=iplot_config)
Summary: Users with 100-999 Global Edits
Among the users who opened the dashboard:
in 71% of the cases, they proceeded to translation start screen.
in 18% of the cases, they ended the session.
in 11% of the cases, they refereshed the dashboard or came back to it later before the session expired.
Among the users who reach the translation start screen:
in 16% of the cases, they transitioned to the editor and made an edit.
in 49% of the cases, they went back to the main dashboard.
in 30% of the cases, they ended the session.
Among users who made at least one edit:
in 85% of the cases, they continued to make additional edits.
in 7% of the cases, they went back to the main dashboard.
in 5% of the cases, they ended the session.
Code
n_events = events.query("""(user_global_edit_count_bucket == '1000+ edits') & (event_type == @main_events)""").shape[0]iplot(plot_funnel(events.query("""user_global_edit_count_bucket == '1000+ edits'"""), chart_title=f'Flow of Users Through CX Workflows Having 1000+ Global Edits ({n_events} events)'), config=iplot_config)
Summary: Users with 1000+ Global Edits
Among the users who opened the dashboard:
in 72% of the cases, they proceeded to translation start screen.
in 17% of the cases, they ended the session.
in 10% of the cases, they refereshed the dashboard or came back to it later before the session expired.
Among the users who reach the translation start screen:
in 32% of the cases, they transitioned to the editor and made an edit.
in 40% of the cases, they went back to the main dashboard.
in 23% of the cases, they ended the session.
Among users who made at least one edit:
in 86% of the cases, they continued to make additional edits.
in 8% of the cases, they went back to the main dashboard.
Across all edit count buckets, most of the users (>70%) who opened the dashboard proceeded to the translation start screen.
The percentage is higher for newcomers compared to experienced users. This is because most newcomers reach the dashboard through external entry points rather than directly opening the dashboard, in which case, both dashboard_open and dashboard_translation_start are consecutively triggered (with no user action in between), whereas, among experienced users, more users open the dashboard directly and then click to proceed translation start screen.
Among users who reached the translation start screen
Newcomers tend to end/abandon the session or return to the main dashboard
Only in 12% of the cases, newcomers continued to make an edit from this stage, whereas users with 1000+ made an edit in 32% of the cases.
With higher the editing experience, the more likely that users will continue to make an edit
Among users who made at least one, with increasing editing experience, the more likely that users will continue to make additional edits to the machine-translated content, and less likely to end the session or return to the dashboard.
By Entry Point
Code
dopen_sources = events.query("""event_type == 'dashboard_open'""").event_source.unique().tolist()# plot funnel for a given source# identifies sessions starting with the specificed source# uses the original plot_funnel functions# includes edit bucket table by default def plot_funnel_for_source(source, incl_edit_bucket_table=True): sessions_with_source = ( events .query(f"""(event_source == '{source}') & (event_type == 'dashboard_open') & (content_translation_session_position == 0)""") .content_translation_session_id .unique() .tolist() ) n_events = events.query("""(event_source == @source) & (event_type == @main_events)""").shape[0] iplot(plot_funnel(events.query("""content_translation_session_id == @sessions_with_source"""), chart_title=f'Flow of Users Through CX Workflows; Source: {source} ({n_events} events) & Number of Events Generated by Edit Bucket', incl_edit_bucket_table=incl_edit_bucket_table, width=max_width, height=max_width/2.25), config=iplot_config)
Code
plot_funnel_for_source('frequent_languages')
Summary: users navigated to the dashboard from frequent languages menu
Among the users who opened the dashboard:
in 82% of the cases, they proceeded to translation start screen.
in 11% of the cases, they ended the session.
in 6% of the cases, they refereshed the dashboard or came back to it later before the session expired.
Among the users who reach the translation start screen:
in 13% of the cases, they transitioned to the editor and made an edit.
in 46% of the cases, they went back to the main dashboard.
in 32% of the cases, they ended the session.
Among users who made at least one edit:
in 79% of the cases, they continued to make additional edits.
in 8% of the cases, they went back to the main dashboard.
transition_by_source_tbl = ( gt .GT( transition_by_source, groupname_col='event_type', rowname_col='next_event_type', ) .tab_header('Transitions to Various Stages of Translation Funnel by Source of Entry to the Dashboard') .fmt_percent(dopen_sources, decimals=0, scale_values=False) .tab_style( style=gt.style.text(size="16px"), locations=gt.loc.body(columns=dopen_sources) ) .tab_style( style=gt.style.borders('right', '#bdbdbd'), locations=gt.loc.body(columns=dopen_sources) ))transition_by_source_tbl
Transitions to Various Stages of Translation Funnel by Source of Entry to the Dashboard
The rate of transition between various stages of the funnel by the source of entry is highly correlated to the usage of the respective entry point by various user experience levels.
Among users who navigated through frequent languages menu, which was most frequently accessed by newcomers:
In 82% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 13% of the cases.
Among users who navigated through content language sector, which was frequently accessed by both newcomers and experienced users alike:
In 75% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 23% of the cases.
Among users who directly opened the dashboard, most frequently by experienced users:
In 36% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 34% of the cases.
Among users who navigated from an invitation shown on a non-existent page, which was frequently accessed by both newcomers and experienced users alike:
In 80% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 14% of the cases.
Among users who directly opened the dashboard with link to specific translation, most frequently by experienced users:
In 87% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 11% of the cases.
Among users who navigated from contributions page, which was frequently accessed by both newcomers and experienced users alike:
In 39% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 7% of the cases.
Among users who navigated from notice on recently translated articles to review/expand the translation, most frequently by experienced users:
In 88% of the cases, they proceeded to the translation start screen.
From the translation start screen, users made at least one edit in 31% of the cases.
User Flows: Other Events
Code
# users flows and interactions with events apart from the main events (open, start, edit)other_event_transitions = ( plot_funnel( events, events_scope=events.event_type.unique().tolist(), return_transition_data=True) .query("""event_type != @main_events""") .sort_values(['event_type', 'percentage'], ascending=[True, False]) .drop('total_by_source', axis=1))other_event_transitions['next_event_type'] = (other_event_transitions['next_event_type'] .replace({i:f'➔ {i}'for i in events.event_type.unique().tolist()+['session end']}))other_event_transitions['event_type'] = (other_event_transitions['event_type'] .replace({i:f"""{i} ({events[events['event_type'] == i].shape[0]} events)"""for i in events.event_type.unique().tolist()}))other_event_transitions_tbl = ( gt .GT( other_event_transitions, rowname_col='next_event_type', groupname_col='event_type' ) .fmt_percent('percentage', scale_values=False, decimals=1) .cols_label( count='# Events', percentage='Percentage' ) .tab_header('Transitions Between Other Event Types', 'apart from dashboard_open, dashboard_traslation_start, editor_segment_add'))other_event_transitions_tbl
Transitions Between Other Event Types
apart from dashboard_open, dashboard_traslation_start, editor_segment_add
# Events
Percentage
dashboard_discard_suggestion (677 events)
➔ dashboard_discard_suggestion
540
79.8%
➔ dashboard_translation_start
70
10.3%
➔ session end
34
5.0%
➔ dashboard_open
16
2.4%
➔ dashboard_search
6
0.9%
➔ dashboard_translation_continue
6
0.9%
➔ dashboard_refresh_suggestions
3
0.4%
➔ dashboard_translation_discard
1
0.1%
➔ editor_segment_add
1
0.1%
dashboard_refresh_suggestions (110 events)
➔ dashboard_refresh_suggestions
37
33.6%
➔ dashboard_translation_start
23
20.9%
➔ session end
18
16.4%
➔ dashboard_open
17
15.5%
➔ dashboard_search
7
6.4%
➔ dashboard_discard_suggestion
4
3.6%
➔ dashboard_translation_continue
4
3.6%
dashboard_search (958 events)
➔ dashboard_translation_start
791
82.6%
➔ dashboard_open
116
12.1%
➔ session end
33
3.4%
➔ dashboard_translation_continue
14
1.5%
➔ dashboard_search
3
0.3%
➔ editor_segment_add
1
0.1%
dashboard_translation_continue (440 events)
➔ dashboard_open
294
66.8%
➔ editor_segment_add
70
15.9%
➔ session end
64
14.5%
➔ dashboard_translation_continue
5
1.1%
➔ dashboard_translation_discard
3
0.7%
➔ dashboard_translation_start
3
0.7%
➔ dashboard_search
1
0.2%
dashboard_translation_discard (132 events)
➔ dashboard_translation_discard
76
57.6%
➔ dashboard_search
18
13.6%
➔ session end
14
10.6%
➔ dashboard_open
12
9.1%
➔ dashboard_translation_continue
8
6.1%
➔ dashboard_translation_start
3
2.3%
➔ dashboard_discard_suggestion
1
0.8%
Summary
In cases where users discarded a suggested translation (677 occurrences), in 80% of the cases they continued to discard the next translation show as well, and 10% proceeded to the translation start screen.
In cases where users requested that the list of suggestions be regenerated (110 occurrences), in 33% of the cases they refreshed the suggestions again, and 20% proceeded to the translation start screen.
In cases where users initiated a search (958 occurrences), in 82% of the cases they proceeded to the translation start screen, and 12% returned to the dashboard.
In cases where users selected an in-progress translation (440 occurrences), in 67% of the cases they returned to the dashboard, and 16% made an edit to the translation.
In cases where users discarded an in-progress translation (132 occurrences), in 58% of the cases they discarded additional in-progress translations, and 13% initiated a search.