“Human parity” achieved
A study published last Monday, heralded as an historic achievement by Microsoft,
details a new speech recognition technology that’s able to transcribe
conversational speech as well as humans — or at least, as best as
professional human transcriptionists (which is better than most humans).
The technology scored a word error rate (WER) of 5.9%, which was
lower than the 6.3% WER reported just last month. “[I]t’s the lowest
ever recorded against the industry standard Switchboard speech
recognition task,” Microsoft reports. The rate is the same as (or even lower than) the human professional transcriptionists who transcribed the same conversation.
“We’ve reached human parity,” says Xuedong Huang, Microsoft’s chief
speech scientist. The new technology uses neural language models that
allow for more efficient generalization by grouping similar words
together.
The achievement comes decades after speech pattern recognition was
first studied in the 1970s. With Google’s DeepMind making waves in
speech and image recognition (and speaking like humans do), the technology is Microsoft’s timely contribution to the fast-paced artificial intelligence (AI) research and development.
The achievement was unlocked using the Computational Network Toolkit, Microsoft’s homegrown system for deep learning.
Next step: Understanding
The applications for the new technology are bound to improve user
experience for Microsoft’s personal voice assistant for Windows and Xbox
One. “This will make Cortana more powerful, making a truly intelligent
assistant possible,” says an excited Harry Shum, the executive vice
president heading the Microsoft Artificial Intelligence and Research
group. Of course, it will also develop better speech-to-text
transcription software.
Microsoft clarifies, however, that parity does not mean perfection.
The computer did not recognize every word clearly, which is something
not even humans could do perfectly (nor can Siri or other existing voice
assistants).
Impressive as it is, there remains room for improvement. The next
goal: making computers understand human conversation. “The next frontier
is to move from recognition to understanding,” says Geoffrey Zweig, Speech & Dialog research group manager.
0 comments :
Post a Comment