Tensor2Tensor, shortly known as T2T, is a library of pre-configured deep learning models and datasets. The Google Brain team has developed it to do deep learning research faster and more accessible. It uses TensorFlow throughout and aims to improve performance and usability strongly. Models can be trained on any of the CPU, single GPU, multiple GPU and TPU either locally or in the cloud. Tensor2Tensor models need minimal or zero configuration or device-specific code. It provides support for well-acclaimed models and datasets across different media platforms such as images, videos, text and audio. However, Tensor2Tensor demonstrates outstanding performance in Neural Machine Translation (NMT) with a huge collection of pre-trained and pre-configured models and NMT datasets.
Neural Machine Translation has a long history and is still in progress with a variety of emerging approaches. Neural Machine Translation found its great success using the recurrent neural networks employed with LSTM cells. Since the input sequence to the recurrent neural network must be encoded to a fixed-length vector, it showed poor quality results in translating long sentences. This issue was partially overcome by models with ensemble or stack of gated convolutional networks and recurrent neural networks. Tensor2Tensor based Transformer architecture built with stacked self-attention layers becomes the new state-of-the-art model in Neural Machine Translation with drastically reduced training cost and remarkably improved BLEU score. This architecture has been introduced by Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Ćukasz Kaiser, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit of Google Brain and Nal Kalchbrenner of DeepMind.
Unlike RNN models, Tensor2Tensor based Transformer has no fixed-sized bottleneck problem. Each time step in this architecture has direct access to the full history of the sequence of inputs enabled by the self-attention mechanism. Self-attention mechanism is known to be a powerful tool in modeling sequential data. It enables high speed training as well as maintaining distance-temporal relationships even during translation of long sequences. The transformer Neural Machine Translation model is composed of two parts: an encoder and a decoder. The encoder and decoder parts are built with stacks of multi-head self-attention layers and fully connected feed forward network layers.
Methodology of Tensor2Tensor
Tensor2Tensor comprises five key components for the training run. They are:
- Datasets
- Device Configuration
- Hyperparameters
- Model
- Estimator and Experiment
Datasets are encapsulated into an input pipeline through the ‘Problem’ class. These classes are responsible for supply of preprocessed data for training and evaluation. Device configurations such as type of processor (CPU, GPU, TPU), number of devices, synchronization mode, and devices’ location are specified. Hyperparameters that instantiate the model and training procedure must be specified along with codes to be reproduced or shared. Model ties together the architecture, datasets, device configurations and hyperparameters to generate the necessary target by controlling losses, evaluation metrics and optimisation. Estimator and Experiment are the classes that handle training in loops, creating checkpoints, logging and enabling evaluation. With the predefined and established approach, Tensor2Tensor achieves greater performance in multiple media platforms.
Python Implementation
Tensor2Tensor is installed using the command
!pip install tensor2tensor
The Tensor2Tensor based Transformer can simply be called and run to perform Neural Machine Translation with predefined setup using the following commands. It can be noted that the code auto-configures itself based on the available configuration settings such as device type, the number of devices and so on. The following commands fetch the data, train and evaluate the transformer model, and test the model by translating a few text lines from a predefined file. It should be noted that training may take hours to days based on the user’s configuration.
%%bash # See what problems, models, and hyperparameter sets are available. # You can easily swap between them (and add new ones). t2t-trainer --registry_help PROBLEM=translate_ende_wmt32k MODEL=transformer HPARAMS=transformer_base_single_gpu DATA_DIR=$HOME/t2t_data TMP_DIR=/tmp/t2t_datagen TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
The following codes fetch the data from the English-to-German translation task the input data pipeline.
%%bash # Generate data t2t-datagen \ --data_dir=$DATA_DIR \ --tmp_dir=$TMP_DIR \ --problem=$PROBLEM
The following codes let the model train on the defined dataset, evaluate internally.
%%bash # Train # If you run out of memory, add --hparams='batch_size=1024'. t2t-trainer \ --data_dir=$DATA_DIR \ --problem=$PROBLEM \ --model=$MODEL \ --hparams_set=$HPARAMS \ --output_dir=$TRAIN_DIR # Decode DECODE_FILE=$DATA_DIR/decode_this.txt echo "Hello world" >> $DECODE_FILE echo "Goodbye world" >> $DECODE_FILE echo -e 'Hallo Welt\nAuf Wiedersehen Welt' > ref-translation.de BEAM_SIZE=4 ALPHA=0.6 t2t-decoder \ --data_dir=$DATA_DIR \ --problem=$PROBLEM \ --model=$MODEL \ --hparams_set=$HPARAMS \ --output_dir=$TRAIN_DIR \ --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \ --decode_from_file=$DECODE_FILE \ --decode_to_file=translation.en
The following codes enable user to sample check the translation performance on an unseen text
%%bash # See the translations cat translation.en
Finally, BLUE score can be calculated to evaluate the model with global standards
%%bash # Evaluate the BLEU score t2t-bleu --translation=translation.en --reference=ref-translation.de
As an alternative to Colab, Tensor2Tensor models can be easily run on cloud based FloydHub workspaces as it is preinstalled with Tensor2Tensor, highly supporting configured on-the-go pre-trained models.
Performance evaluation of Tensor2Tensor Transformer
Tensor2Tensor based Transformer exhibits great performance in respect of syntactic and semantic considerations in Neural Machine Translation. It shows much greater computational efficiency compared to Recurrent Neural Networks with reduced computational time and memory. Tensor2Tensor enables interpretation of language models with self-attention by visualizing the attention distribution. This architecture is evaluated using WMT 2014 Translation task.
On the WMT 2014 English-to-French translation task, the Tensor2Tensor based Transformer model achieves a state-of-the-art BLEU score of 41.8, outperforming all of the previously published single models, at less than 1/4 the training cost of the previous state-of-the-art model.
On the WMT 2014 English-to-German translation task, the Tensor2Tensor based Transformer model achieves a state-of-the-art BLEU score of 28.4, outperforming all of the previously published single models and ensembles, at a fraction of the training cost of the previous state-of-the-art model.
No comments:
Post a Comment