Improving the Transformer Translation Modelwith Document-Level Context
Abstract - Although the Transformer translation model (Vaswani et al., 2017) has achieved state-of-the-art performance in a variety of translation tasks, how to use document-level con-text to deal with discourse phenomena prob-lematic for Transformer still remains a chal-lenge. In this work, we extend the Transformermodel with a new context encoder to repre-sent document-level context, which is then in-corporated into the original encoder and de-coder. As large-scale document-level paral-lel corpora are usually not available, we intro-duce a two-step training method to take fulladvantage of abundant sentence-level parallelcorpora and limited document-level parallelcorpora. Experiments on the NIST Chinese-English datasets and the IWSLT French-English datasets show that our approach im-proves over Transformer significantly