Voice recognition technology has well and truly ‘talked’ its way into the consumer psyche. One of the biggest selling products over the festive season last year was Amazon Echo, and sales of the device that boasts an intelligent personal assistant called Alexa show no signs of slowing down this Christmas.

Last year Google revealed that 20 percent of mobile search queries were initiated by voice Click To Tweet

Last year Google revealed that 20 percent of mobile search queries were initiated by voice and it expects that figure to rise to 50 percent by the end of the decade. It bears testament to the great strides that have been made in the technology that underpins these devices enabling them to understand and respond to your commands with ever-increasing accuracy and efficiency.

But what impact has this new smart technology had in the workplace? Many would argue that the biggest opportunity to bring new business benefits lies in the development of technology that can transcribe spoken language into text. Why? As the pace of communication continues to increase, aided by the proliferation of cloud-based tools and social media platforms, it has never been more important to have a clear record of who said what – not only to increase accountability and transparency but to increase productivity, cutting out laborious time spent transcribing calls and meetings.

However, delivering on that comes with challenges. Transcription tools have had a hard time in the past and there has been a long-held consensus that flawless voice recognition is an impossible challenge for computers alone to master. But things are changing.

Why is speech recognition seen as an impossible technology to master?

  1. Up until now, high-quality, accurate transcription on a computer has only been possible when those having the conversation were speaking slowly and clearly, with a well-positioned microphone in a quiet room. And even then, these transcriptions still missed the mark. It’s not hard to understand why: these programs were designed to register any noise coming through a microphone, not just the voices of those involved, but also any background noises – the inevitable echoes, the scuffle of an object being moved about or someone sneezing in the next room etc. All of these elements have an impact on the accuracy of the final transcription. If you then add in the mix different regional and international accents, slang words and unknown acronyms, it’s easy to understand why automated voice transcription is such a tricky thing to pull off without human intervention.
  2. The data points created during a single conference call also present a big challenge. For a conference call involving a number of people, the amount of unstructured data created is huge, and almost impossible to pick apart to be able to attribute who said what to the right people. Typically, you still need a human brain to decipher that data and make sense of it all.
  3. As anyone in IT knows, collecting data of any kind comes with privacy concerns and voice recognition brings an added layer to that problem. This kind of technology, as mentioned, is designed to respond to all noise – including potentially picking up sensitive conversations that might be taking place in the background. Clear parameters will need to be part of the process if voice recognition software is to build and maintain a long-term reputation as a valuable business tool.

Why is it important to keep pushing for perfection?

So, if accurate voice transcription is such a hard task, why bother developing this technology that many have said is ‘impossible’ to master? The reason lies in its huge potential.

thanks to investment in this space, they have also been able to dramatically improve the accuracy of written transcriptions, with Microsoft claiming that its program can now match the accuracy of human transcribers, having reduced its word error rate from 5.9 percent last year to 5.1 percent

Companies like Google, Amazon and Microsoft have understood this and have been investing heavily in core transcription technology that is needed to power their ever-increasing portfolio of home products. But thanks to investment in this space, they have also been able to dramatically improve the accuracy of written transcriptions, with Microsoft claiming that its program can now match the accuracy of human transcribers, having reduced its word error rate from 5.9 percent last year to 5.1 percent in August this year – the average error rate expected of a professional human transcriber.

With the ability to automatically transcribe virtual meetings and conference calls, workers could also save themselves a huge amount of time. Not only could candidate interviews be recorded in full for people to refer back to and customer testimonials captured ready to be posted on your company website, meeting minutes could be automatically transcribed and agreements between parties could be traced back easily for increased accountability. It could also help managers keep track of increasingly distributed teams and doctors could even transcribe interviews with patients in medical trails – the list of applications is endless.

The adaptability of this technology means that every company can tailor it to their needs. Mujo, a medical rehabilitation start-up, for example, has recently been using Yack.net, our collaboration platform with call transcription capabilities to record board meetings to keep a written trace of what was said and agreed. This has allowed all parties to keep track of tasks without someone having to transcribe those agreements and progress updates.

Which businesses are driving this technology forward?  

Although this technology isn’t perfect, it is clear that we should keep pushing forward. And the good news is that there are a few really good applications out there already:

  • In the smartphone world, a great example is Cassette, an iOS app that allows the user to record conversations using their iPhone as a recording device. The recorded speech is transcribed and can be replayed.
  • Designed specifically for the workplace, our Yack.net is a team collaboration tool that allows users to record calls with up to six participants. The calls are automatically transcribed and can be searched and replayed in sections, allowing users to pinpoint who said what.
  • Trint is a transcription tool designed for journalists. Pre-recorded interviews are uploaded and transcribed. The interface is optimised to enable the journalist to correct the automatically generated transcription, where required.

So, does voice transcription technology deserve its bad reputation?

In a way, yes. It’s a work in progress though – and that progress has come on leaps and bounds in the last two or three years. That development has been and continues to be pushed along rapidly thanks to the investment from some of the biggest players in technology who all believe that voice recognition has a bright future. From our own more humble perspective, we have worked tirelessly on enhancing our transcription engine and will continue to push its limits so that we can play our part in unlocking the huge untapped benefits of this technology.

Previous article5 Myths about Moving your Email to the Cloud
Next articleThe list of lesser Advertised Benefits of Cloud Computing
Alan is the founder and CEO of Yack.net, the newly launched company behind a technology that combines collaboration and call transcription capabilities in one platform. The ground-breaking tool automatically records and transcribes calls into text, and is already being used successfully by a number of SME businesses to ‘capture their conversations’. Having graduated from Oxford University with a degree in Physics, Alan started his career working for Micro Focus before he decided to take the entrepreneurial jump in 2010 and founded Connected Digital, a software development company, responsible for creating the technology behind Yack.net.