ConvNet Seq2seq for Machine Translation

In these few days, Facebook published a new research paper, regarding the use of sequence to sequence (seq2seq) model for machine translation. What is special about this seq2seq model is that it uses convolutional neural networks (ConvNet, or CNN), instead of recurrent neural networks (RNN).

The original seq2seq model is implemented with Long Short-Term Memory (LSTM) model, published by Google.(see their paper) It is basically a character-based model that generates texts according to a sequence of input characters. And the same author constructed a neural conversational model, (see their paper) as mentioned in a previous blog post. Daewoo Chong, from Booz Allen Hamilton, presented its implementation using Tensorflow in DC Data Education Meetup on April 13, 2017. Johns Hopkins also published a spell correction algorithm implemented in seq2seq. (see their paper) The real advantage of RNN over CNN is that there is no limit about the size of the tokens input or output.

While the fixing of the size of vectors for CNN is obvious, using CNN serves the purpose of limiting the size of input vectors, and thus limiting the size of contexts. This limits the contents, and speeds up the training process. RNN is known to be trained slow. Facebook uses this CNN seq2seq model for their machine translation model. For more details, take a look at their paper and their Github repository.


  • Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin, “Convolutional Sequence to Sequence Learning.” (2017) [PDF]
  • Ilya Sutskever, Oriol Vinyals, Quoc V. Le, “Sequence to Sequence Learning with Neural Networks,” arXiv:1409.3215 (2014). [arXiv]
  • Oriol Vinyals, Quoc V. Le, “A Neural Conversational Model,” arXiv:1506.05869 (2015). [arXiv]
  • Kwan-Yuet Ho, “Chatbots,” Everything About Data Analytics, WordPress (2015). [WordPress]
  • “Training a Chatbot with a Recurrent Neural Network,” DC Data Education (April 13, 2017). [Meetup]
  • Keisuke Sakaguchi, Kevin Duh, Matt Post, Benjamin Van Durme, “Robsut Wrod Reocginiton via semi-Character Recurrent Neural Network,” arXiv:1608.02214 (2016). [arXiv]
  • Github repository: facebookresearch/fairseq. [Github]
  • “Facebook提出全新CNN机器翻译:准确度超越谷歌而且还快九倍”, 机器之心 (2017). [Zhihu]

One thought on “ConvNet Seq2seq for Machine Translation

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s