August 29, 2019

18626 words 88 mins read

THUNLP-MT/MT-Reading-List

THUNLP-MT/MT-Reading-List

A machine translation reading list maintained by Tsinghua Natural Language Processing Group

repo name THUNLP-MT/MT-Reading-List
repo link https://github.com/THUNLP-MT/MT-Reading-List
homepage
language TeX
size (curr.) 806 kB
stars (curr.) 1598
created 2018-12-03
license BSD 3-Clause “New” or “Revised” License

Machine Translation Reading List

This is a machine translation reading list maintained by the Tsinghua Natural Language Processing Group.

The past three decades have witnessed the rapid development of machine translation, especially for data-driven approaches such as statistical machine translation (SMT) and neural machine translation (NMT). Due to the dominance of NMT at the present time, priority is given to collecting important, up-to-date NMT papers; the Edinburgh/JHU MT research survey wiki has good coverage of older papers and a brief description for each sub-topic of MT. Our list is still incomplete and the categorization might be inappropriate. We will keep adding papers and improving the list. Any suggestions are welcome!

WMT is the most important annual international competition on machine translation. We collect the competition results on the news translation task since WMT 2016 (the First Conference of Machine Translation) and summarize the techniques used in the systems with the top performance. Currently, we focus on four directions: ZH-EN, EN-ZH, DE-EN, and EN-DE. The summarized algorithms might be incomplete; your suggestions are welcome!

  • The winner of ZH-EN: Tencent

    • System report: Mingxuan Wang, Li Gong, Wenhuan Zhu, Jun Xie, and Chao Bian. 2018. Tencent Neural Machine Translation Systems for WMT18. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers.
    • Techniques: RNMT + Transformer + BPE + Rerank ensemble outputs with 48 features (including t2t R2l, t2t L2R, rnn L2R, rnn R2L etc.) + Back Translation + Joint Train with English to Chinese systems + Fine-tuning with selected data + Knowledge distillation
  • The winner of EN-ZH: GTCOM

    • System report: Chao Bei, Hao Zong, Yiming Wang, Baoyong Fan, Shiqi Li, and Conghu Yuan. 2018. An Empirical Study of Machine Translation for the Shared Task of WMT18. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers.
    • Techniques: Transformer + Back-Translation + Data Filtering by rules, language models and translation models + BPE + Greedy Ensemble Decoding + Fine-Tuning with newstest2017 back translation
  • The winner of DE-EN: RWTH Aachen University

    • System report: Julian Schamper, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, and Hermann Ney. 2018. The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers.
    • Techniques: Ensemble of 3-strongest Transformer models + Data Selection + BPE + Fine-Tuning + Important Hyperparameters (batch size and model dimension)
  • The winner of EN-DE: Microsoft

  • The winner of ZH-EN: Sogou

    • System report: Yuguang Wang, Shanbo Cheng, Liyang Jiang, Jiajun Yang, Wei Chen, Muze Li, Lin Shi, Yanfeng Wang, and Hongtao Yang. 2017. Sogou Neural Machine Translation Systems for WMT17. In Proceedings of the Second Conference on Machine Translation: Shared Task Papers.
    • Techniques: Encoder-Decoder with Attention + BPE + Reranking (R2L, T2S, N-gram language models) + Tagging Model + Name Entity Translation + Ensemble
  • The winner of EN-ZH, DE-EN and EN-DE: University of Edinburgh

    • System report: Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, and Philip Williams. 2017. The University of Edinburgh’s Neural MT Systems for WMT17. In Proceedings of the Second Conference on Machine Translation: Shared Task Papers.
    • Techniques: Encoder-Decoder with Attention + Deep Model + Layer Normalization + Weight Tying + Back-Translation + BPE + Reranking(L2R, R2L) + Ensemble
  • The winner of DE-EN: University of Regensburg

    • System report: Failed to find it
    • Techniques: Failed to find it
  • The winner of EN-DE: University of Edinburgh

comments powered by Disqus