With a new iPhone 5 on the Verizon LTE network, I’ve been experimenting with Siri for simple tasks and information requests. Getting disappointingly mixed results. Until I learned how to get the best results from Siri, I found myself wasting time trying to correct Siri’s mistakes. But when she works properly, it’s delightful.
What I’ve learned is that my normal speaking voice and cadence cause Siri to make too many errors. To get the best out of Siri, I’ve had to learn to slow down, speak more simply, use keyword-style diction, and enunciate more clearly.
I try to avoid flowing sentences or words that sound as if they run together. As a result my interactions with Siri sound somewhat robotic, but they get the job done…
As an example of cadence problems, when I asked Siri to launch the web page for Amazon’s wine store, Siri heard “swine,” and displayed a Google search results page with information about pigs.
When I ask Siri to find something that is stored on my phone (such as contact or calendar info), I do so using keywords that often appear on the iPhone. So I get better results when I ask Siri to “call Dad Wilson’s home phone” rather than “call Dad.” Sadly, my takeaway on how best to use Siri is that it’s yet another example of humans having to learn from the technology, rather than vice versa.
That said, I’ve found dictating text messages via iMessage (on the iPhone or my new MacBook Pro) to be a lovely time-saver. The content of text messages is usually quite simple, so the lack of complexity drives better results with Siri.
Net net: there are occasional moments of delight, clear time-savers and conveniences in a narrow domain, but the overall results are still underwhelming. Apple knows that Siri is not yet ready for prime time, so Apple positions Siri as a beta product.
Fortunately, technology leaders like Apple and Microsoft are hard at work to find ways that will enable us to speak more naturally to or through our devices, and get the results we expect.
Microsoft: What’s Cooking in the Lab
Microsoft Research has begun demonstrating a more accurate, real-time speech recognition capability that also leverages machine language translation technology. Microsoft claims this speech recognition approach drives 30% fewer mistakes than the other technologies on the market today. This 10-minute video shows Rick Rashid, head of Microsoft Research, demonstrating this technology. It’s worth watching to see how far they’ve come in the lab.
First you’ll see real-time transcription of his speech (like next-gen closed captioning) — amazingly good, but not error-free. Then you see the technology translating Rashid’s speech into Chinese Mandarin; first as written language and then spoken language.
What’s amazing is that the spoken version of the machine-language translation attempts to mimic the speaker’s natural tones and cadences. Just imagine the improvements when this evolves from the lab to a commercial quality product!
I think the Microsoft technology demonstration gives us a peek at what the future has in store for us. A world in which we can use our own voice to more accurately direct our devices to carry out specific task requests, as well as one that enables us to communicate with other people across language barriers.