最近(2017年以来)的WMT14 English-French Baseline记录
1. GNMT
https://arxiv.org/pdf/1609.08144.pdf
语料处理:a shared source and target vocabulary of 32K wordpieces
For the wordpiece models, we train 3 different models with vocabulary sizes of 8K, 16K, and 32K. Table 4 summarizes our results on the WMT En→Fr dataset. In this table, we also compare against other strong baselines without model ensembling. As can be seen from the table, “WPM-32K”, a wordpiece model with a shared source and target vocabulary of 32K wordpieces, performs well on this dataset and achieves the best quality as well as the fastest inference speed.
On WMT En→Fr, the training set contains 36M sentence pairs. In both cases, we use newstest2014 as the test sets to compare against previous work. The combination of newstest2012 and

本文回顾了2017年以来WMT14英法翻译基准系统的进展,包括GNMT的32K wordpieces模型,Transformer的基线和大模型,RNMT+,ConvS2S以及Fairseq。各模型使用不同的词汇处理,如wordpieces和BPE,实验结果显示Fairseq在WMT'14上取得了43.2的高分。
424

被折叠的 条评论
为什么被折叠?



