A few weeks after announcing that the Google Translate service would be upgraded to use an machine learning system called Google Neural Machine Translation (GNMT), the engineers who run it announced that the new system has made great progress. In particular, it showed that it can translate from one language to another without specific training through an interlingua created on its own.
Most web surfers tried Google Translate to be able to read web pages written in an unknown language. The results usually are not exactly at the level of those from a human translator and the company is trying to improve them using a machine learning system that, just like humans, improves with time its knowledge of a language through exercise.
For those interested in the technical details, at the end of September 2016 the Google Brain team published a technical article titled “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation“. In recent days, another one titled “Google’s Neural Multilingual Machine Translation System: Enabling Zero-Shot Translation” was published which describes the progress made by the GNMT system.
After the training phase, Google engineers verified that not only there was an improvement in the translations quality but also that the GNMT system was able to go beyond its training. The expression “Zero-Shot Translation” used in the new technical article indicates the ability to perform translations of two languages for which it not received no specific training.
The GNMT system was trained to translate a number of different languages pairing couples of them. For example, it learned to translate from Japanese to English and vice versa and from Korean to English and vice versa. At that point, the engineers tried to make it translate from Korean to Japanese and vice versa without a specific training and the result was declared reasonable.
The really important factor of this experiment is that the GNMT system didn’t go through English to translate from Korean to Japanese and vice versa. Using a 3-dimensional representation of its internal network data, the engineers were able to look at the system during its translation and noted the signs of the existence of an interlingua, a common representation in which sentences with the same meaning are represented in similar ways regardless of their language.
In the image the part (a) shows an overall geometry of the various translations made by the GNMT system with various colors to indicate different sentence meanings. The part (b) shows a specific area and the part (c) shows the colors depending on the starting language.
Within the area specified in part (b) there’s a sentence with the same meaning in three different languages. According to Google engineers this means that the GNMT system is coding something about the sentence semantics instead of storing sentence for sentence translations. This was interpreted as a sign of the existence of an interlingua.
Google is adopting the GNMT system for its Translate language after language. It will take time to support all 103 languages selectable from the service and the engineers predict that there will still be errors. The advantage lies in the ability of a machine learning system to learn over time so users can hope that in the future there will be less and less mistakes, which sometimes are ridiculous.