Throughout the analysis, the focus has been on various factors affect the deletion outcome of articles created through the Content Translation tool, the goal is to inform improvements to machine translation limits system. The current system uses the percentages of unmodified (machine translated) content and human modified content to throttle/block translations. However, from both the exploratory data analysis and regression analysis, it was observed that articles that were not deleted had a higher percentage of machine translated content and increase in machine translated content is associated with decreased probability of deletion. This may be counter-intutive and contrary to the current understanding of how these percentages might affect the quality of the translated article. As machine translation alogrithms get more and more accurate with time, it might get even harder to estimate what is a good threshold. In the past, communities have reported that the thresholds are sometimes too restrictive in a way that even good quality translations are blocked. While it is not practically sensible to allow 100% machine translated content, it is worth thinking about evolving the system beyond machine translated and human modified percentages.
A key insight from the analysis is how the standard quality impacts the deletion rate. An article can be considered of standard quality, if it is at least 8kB long, has at least one category, has at least seven sections, is illustrated with at least one image, has at least four references and two intra wiki links. While it is not practical for all the translations to meet standard quality, it can be encouraged to the have the translation as close as possible to the standard quality. We also observed the increase in target article’s size is associated with decrease in deletion probability. So even if all translations are not 8kB long, they can be encouraged to be expanded. In addition to the machine translation and human modification percentages, the criteria can be also be used to think about thresholds and checks. For example, throttle/block if there is not even one intra-wiki link or a reference. Similar to Edit check, they can also be used to provide actionable feedback when a user is translating an article.
In addition, increase in the time elapsed since users’ previous edit is associated with decrease in probability of deletion. As of one of the goals was to understand how campaigns/contests impact the deletion rate, as users tend to create more content without much attention to the quality, we explored how the number of translations created during the preceeding time frame has an impact on the deletion outcome. Although different intervals were observed and among all of them, increase in the number of articles during the given preceeding time frame is associated with increase in deletion probability, the number of articles created during the preceeding 15 days has the highest impact. Most of the impact of shorter time frames is not significant.
To conclude, for the limits system to effectively counter (and provide feedback) for potentially low quality translations, other factors apart from machine translation and human modification percentages can be considered, mostly importantly the standard quality criteria. A follow-up analysis, if needed, can look into how each of the criteria for standard quality impacts the deletion outcome.