Late last week in on a stage in Tainjin, China, Microsoft’s Rick Rashid demonstrated a new “speech recognition breakthrough for the spoken, translated word”. Initially as Mr Rashid spoke the translation appeared on a screen behind him on the stage. This almost instant translation was impressive; his sentences, spoken in English appeared as Chinese sentences, properly written, ordered into grammatically correct Chinese. Later on in the video he demonstrates the computer speech, in Chinese, of these translated sentences using his own natural voice characteristics.
Skip to about 7 minutes, 30 seconds to see and hear the system working fully
Translating a between languages isn’t very straight forward because word-for-word translation, quite easy for a computer, simply doesn’t convey the meanings of sentences correctly a lot of the time. If you’ve used any of the online translators you will know about this. Sometimes a breaking news article source we find at HEXUS may be only initially available in Chinese, Japanese or whatever and the articles are fine to read through specifications-wise but the writing around the specs is pretty useless.
In fact a bad translation can actually give you information that is the opposite of the truth. A little knowledge is a dangerous thing. Correct grammar is very important to sentence meanings, great vocabulary is not enough. For example in English a simple misplaced comma can make a world of difference;
- Woman, without her man, is nothing.| Woman, without her, man is nothing.
- Jim needs a camera, battery and case. | Jim needs a camera battery and case.
Microsoft’s new approach to speech translation doesn’t use the conventional waveform “pattern matching” approach. Also it doesn’t settle for the currently popular “hidden Markov modelling” method. Mr Rashid says the company went back to the drawing board and used “a technique called Deep Neural Networks, which is patterned after human brain behaviour.” It is a multi-stage process which translates the words and then orders them to make sensible human Chinese sentences.
A further step that the software demonstration took was to use Mr Rashid’s own voice characteristics to shape the Chinese computer voice. This is “a text to speech system that Microsoft researchers built using a few hours speech of a native Chinese speaker and properties of my own voice taken from about one hour of pre-recorded (English) data”. As you can hear from the video, the crowd at the presentation loved the demo. Skip to about 7 minutes, 30 seconds to see and hear the system working fully, written text and natural voice.
The original Star Trek Universal Translator
Mr Rashid concludes his blog post by admitting his translator isn’t perfect but that it is “very promising” and “we may not have to wait until the 22nd century for a usable equivalent of Star Trek’s universal translator”. The hope is for just a few more years until the technology can break down the language barriers.