[Disclaimer: This is an automatic translation done by Google Translator from an article in Spanish just done to demonstrate some topics]
I remember with some bitterness the intense technical discussions that I had with Scott a few years ago to define the architecture of a system of management of competences. It was not that I was always right, but from time to time my arguments succeeded in imposing their too academic view of the problems we faced in the research group. Or they should have done so because the man, feeling cornered, lowered his voice, increased the cadence of his words and went to the "Shakespeare" mode to begin to rebut my arguments in a language that I was almost incomprehensible. So, after half an hour without understanding half the terms he used and having to urge him to repeat his speech continuously, in the end I ended up yielding and his arguments prevailing.
By this I only want to emphasize that I recognize the tremendous importance that English has in the present times and justify the enormous efforts that all the parents make so that our children acquire a good command of this language.
However, I fear that such efforts will have been in vain because, in just a few years, automatic and simultaneous translation between any two languages will become a reality thanks to the new techniques of Artificial Intelligence.
To achieve an effective simultaneous translation, three components must be orchestrated:
- Voice-to-Text: an application capable of listening to what we say and converting it to a text that can be processed by the machines
- Translator: the system responsible for translating the text from one language to another
- Text-to-Speech: a final application that dictates the translated text
It is not science fiction, Microsoft has long announced the ability of Skype Translator to overcome the language barriers allowing simultaneous translation in 8 languages in their voice or video calls and up to 50 if you use the instant messaging system.
That is, with Skype Translator you can call a person in China, say anything in Spanish and listen to their answers instantly also in a "perfect" Spanish.
I do not know if you have tried it but the results, without being perfect, are quite satisfactory. And yet the best of simultaneous translation is yet to come.
Whatsapp or Skype to dictate the messages we want to send.
The reliability of these systems does not stop increasing year after year. What's more, in October 2016 Microsoft announced that it had managed to reduce the error rate of its voice recognition algorithms to 5.3%, which equates them with the human capacity. Similar efforts are being made by IBM to enable its Watson Intelligence to recognize the human voice without errors, Google follows this same line to improve the skills of Google Now and Apple does the same so that one day, Siri will be able to Understand what we say.
And, best of all, any developer can incorporate voice recognition into their applications with existing services in the cloud. So Google offers us the Cloud Speech API, Microsoft has the Bing Speech API integrated in its Cognitive Services and Amazon does it through the Alexa Voice Services.
However, we still can not sing victory. Speech recognition systems have problems when there is a lot of ambient noise or when several people talk simultaneously as it unfailingly happens in any family gathering.
After converting our voice to text, it now has to translate it into another language, certainly the most complex task of the three that must be done to achieve simultaneous translation.
However, the GNMT (Google Neural Machine Translation System) is already beginning to approximate the accuracy of the best human translators as shown in the chart accompanying this section.
To demonstrate the ability of this system, I have dared to publish this same article in English, translated directly and without making any changes by Google Translator (I hope that the Anglo-Saxons know how to forgive me). Here you have the original article.
(*) Similar skills demonstrate Microsoft Translator although Gates have already announced that they will soon be able to translate multi-lingual group conversations in real time. That is to say that soon we will be able to maintain a simultaneous conversation with a Chinese, a Japanese and an American (excuse me, it seems the beginning of a joke). These advances will automatically be transferred to Skype and even Outlook where we can already automatically translate any emails we receive with this add-in
As you can imagine IBM is not far behind with the Watson Language Translator offered free of charge through Bluemix.
To demonstrate it here you have the paragraph marked (*) translated into English with the translators of Microsoft, Google and IBM:
MICROSOFT: "Similar skills shows Microsoft Translator although Gates has already announced that they will soon be able to translate conversations in several languages in real time. That is to say that we will soon have a simultaneous conversation with a Chinese, a Japanese and an American (forgive me, looks like the beginning of a joke). These advances will be transferred automatically to Skype, even Outlook where already we can automatically translate any email that we receive with this add-in"
GOOGLE: "Similar skills provide Microsoft Translator although Gates have already announced that they will soon be able to translate multi-lingual group conversations in real-time. That is to say that we will be able to maintain a simultaneous conversation with a Chinese, To Japanese and an American (excuse me, it seems the beginning of a joke). These advances will automatically be transferred to Skype and even Outlook where we can already automatically translate any emails we receive with this add-in "
IBM WATSON: "Similar Skills shows Microsoft Translator but Gates have already announced that they will soon be able to translate group conversations in multiple languages in real time. This means that we can maintain a simultaneous conversation with Chinese, Japanese and American (Forgive me, it seems the beginning of a joke). These advances are automatically moved to Skype and even Outlook where we can automatically translate any email that we receive this add-in "
You will say who does it best but, above all, do this same exercise next year to see what happens.
Of the systems that intervene in simultaneous translation, this is apparently the easiest to implement. However, we've spent decades listening to computers talk with robotic voices. This is because, in order to speak correctly, it is necessary to play with intonation and pauses. In this area Google seems to be one step ahead with Wavenet, a synthesizer (and something else) with a quality comparable to the human voice. Without going into technical details, here you can hear the same voice synthesized with traditional algorithms and with the new algorithm of Google (source DeepMind ...)
For its part Amazon is developing Polly, a cloud service that includes 47 realistic voices in 11 languages. Here are some examples:
|Inglés||Joanna||Joey||Hello. Do you speak a foreign language? One language is never enough.|
|Danés||Naja||Mads||Hej. Taler du et fremmed sprog? Et sprog er aldrig nok.|
|Portugués de Brasil||Vitória||Ricardo||Oi. Você fala algum idioma estrangeiro? Somente um idioma nunca é bastante.|
|Español||Penélope||Miguel||Hola. ¿Hablas algún idioma extranjero? Un solo idioma no es suficiente.|
|Islandés||Dóra||Karl||Halló, Hæ talar þú erlent tungumál? Eitt tungumál er aldrei nóg.|
Nor can we forget about Microsoft who also offers this type of services again through the Bing Speech API (in this same page you have a demo to test the system)
Simultaneous translation systems are already a reality as Skype has demonstrated well, but the best is yet to come.
To this day we can already give a conference in English or Chinese armed only with the translator of Google allowing our computer or mobile to translate and transmit our speech. But we will soon see how any conversation by the mobile will be translated automatically and in real time into the language of our interlocutor with a voice completely natural and with barely making mistakes.
And a little later in the group conversations are possible, leaving the old myth of the Tower of Babel obsolete