How does an Encoder Decoder work and why use it in Machine Learning ?
The Encoder-Decoder is a neural network discovered in 2014 and used in many projects. It is a fundamental cornerstone in translation software.
It can be found in the neural network behind Google Translation.
Therefore it is used for NLP tasks, word processing, but also for Computer Vision ! This is why it is an essential type of neural network to understand.
What is an Encoder-Decoder ?
An Encoder-Decoder is a neural network.
Rather, it is a Machine Learning model composed of two neural networks.
These two neural networks usually have the same structure. The first one will be used normally but the second one will work in reverse.
Let me explain:
A first neural network will take a sentence as input and output a sequence of numbers.
The second network will take this sequence of numbers as input and this time will output a sentence !
In fact these two networks do the same thing, simply one is taken in the normal direction and the other in the opposite direction.
So we have a sentence, a sequence of words, which is then encoded into a sequence of numbers, then decoded into another sequence of words, the translated sentence.
As you can see, the first neural network is an encoder and the second neural network is a decoder.
But why is the Encoder-Decoder effective for translation ?
In fact, when we use an encoder, the meaning of the sentence will be stored and represented by a vector.
We can’t directly convert a sentence in French to its English translation because we would lose the context.
Indeed, if we translate directly, we would in fact translate word by word. Without caring about the global meaning of the sentence.
For example, if we have the sentence “take an expression at the foot of the letter” and we translate it word by word, we would have : “take an expression at the foot of the letter” which means nothing in English.
On the other hand, with an Encoder-Decoder, we would get closer to the translation: “take an expression literally”.
This last result seems to be very close to what we wish to obtain !
In fact, it is thanks to this vector obtained with the Encoder that we can keep the meaning of the sentence.
We can say that the Encoder-Decoder makes a computer translation of the basic sentence to give us the final result. It is this computer translation step that makes this technique so useful.
Encoder-Decoders are widely used in the academic world, but they have a flaw.
The vector that the Encoder produces is fixed. It will be efficient for tasks where the sentence is as small as it is but… as soon as the sentence is too big, the vector will not be able to store all the necessary information.
The final translation of this large sentence will therefore not be as relevant.
Fortunately for us, Encoder-Decoders are now coupled with mechanisms that allow them to adapt to any sentence length.
This is the subject of our next article on attention mechanisms !