You are here: silicon.com > Software > Applications

Applications

Inside the SpinVox Brain

The power of human hardware rarely fails to impress...

Tags: speech recognition, spinvox

By Natasha Lomas

Published: 6 August 2009 11:34 GMT

How much human interaction powers SpinVox's voicemail-to-text conversion system? Natasha Lomas was invited to the company's HQ to see a demo of the system. Did it impress?

A trip to the HQ of SpinVox - the voicemail-to-text conversion company I wrote about last week - has given me a newfound respect for human hardware. By which I mean the ear, the brain and above all the brain's ability to grub and process a grain of meaning from the polluted and chaotic environments humans create.

Listening to a friend explain the implications of the subplot of Moon from across a Tube carriage tortured by the sound of screeching brakes and screaming children? No problem. Filtering out the omnipresent swoosh of lorries and vans on the walk to work to eavesdrop on the conversation of the man on his mobile behind you? It can be done.

Yep, the brain and its tools are impressive alright. But what about SpinVox's Brain and SpinVox's tools?

Along with several other journalists who have been following 'SpinGate' by publicly wondering how much human intervention is required in SpinVox's Voice Message Conversion System (aka The Brain), I was invited to the corporate headquarters in Marlow-on-Thames for a demo of the system - led by company CIO Rob Wheatley.

SpinVox
The reception desk at SpinVox HQ (Photo credit: Natasha Lomas/silicon.com)

It was also billed as a chance to ask some of the questions not cleared up by last week's flutter of press releases - for me the biggest lure. I was expecting the tech demo to be interesting and competent but, as it would obviously be operating in test conditions, a mere taster of a business that can surely only be understood in the daily grind and grit of real-world operation. After all, three journalists in a room can only make so much noise.

So what does SpinVox's technology look like? Although we were shown a diagram of the workflow process - with both its automated and human components - we were forbidden from taking photos or filming. Wheatley also gave us an impassioned plea to "please be sensitive" with what they were telling us - although we were not asked to sign an NDA. A somewhat contradictory message that.

So in the interests of a) brevity and b) sensitivity here's my rough translation of how SpinVox's system works:

A third message drove the system to distraction - admittedly it was a voicemail purposefully encoded in a Texan drawl so see-sawingly folksy it had most people in the room scratching their heads.

After cleaning up and rating the audio quality - and doing some fundamental checks such as 'what language is this?' and 'is there a message at all?' - the system uses the words it can pluck out of the mire to hazard a guess on the identity of the words it can't. Think 'Spears' coming after 'Britney'.

Wheatley talked about the system building "a lattice" of probabilities of what might be being said - and this is where the terminology starts to sound a tad over-engineered to my ear. A 'lattice of probabilities' is surely kith and kin to the predictive text you get on your phone - i.e. sometimes kind of useful but all too frequently annoyingly misguided as to what it is you're actually trying to say despite the fact you've stacked it with your favourite swearwords by adding them to the user dictionary.

(Does predictive text get better over time? For what it's worth I actually find my phone gets worse at helping me write text messages as more and more once-favoured words accumulate in the dictionary and then plonk themselves into phrases where they're no longer wanted. But I digress.)

Wheatley talked up the 'statistical analysis, acoustic modelling and user learning' that the system apparently uses to get better at predicting the next word each user might have said. And if humans had the vocabulary of sheep this might be an easy task but there's surely no escaping the fact the spoken word does anything but conform to type - even if CEO and co-founder Christina Domecq reckons many speakers can be described as 'average Joes'... (continued on page 2)

  1. Zones
  2. Management
  3. Networks
  4. Software
  5. IT Services
  6. Hardware
  1. Verticals
  2. Public Sector
  3. Financial Services
  4. Retail & Leisure

Tim Ferguson Exclusive: Former MySQL boss Marten Mickos talks open source Why Microsoft could become one of the "biggest friends of open source" and why Oracle getting its hands on MySQL could be "one of the biggest open source coups ever"...

Naked CIO Naked CIO: Cloud computing more expensive than we thought? Smart IT leaders will examine the impact of how they pay for tech


  • Jobs
Business Analyst / Consultant HCM / HRMS (West London / Berkshire)

Key words: Senior Business Consultant, Business Analyst, Human Capital Management, Talent Management, Workforce Performance, Succession Planning, ...

Sales Director/ Team Leader- Marketing Software - 120k

Further to this Business Development responsibility, you will also take the leadership of 2 existing New Business Sales Executives and be charged ...

MANAGEMENT ACCOUNTANT

Produce monthly departmental management accounts, investigating and reporting on budget variances to department heads and Partnership Executive ...

Agenda Setters 2009
Welcome to the ninth annual Agenda Setters poll – silicon.com's list of the top 50 most influential individuals in the technology and IT industries, from techies and CIOs to entrepreneurs and business leaders. Find out more in our latest special report.





Quick Sitemap Links: