I upgraded my iPhone last week from a 3GS to a 4S. I did this mainly because I wanted a better camera phone for our upcoming vacation — the 3GS has a 3 megapixel camera; the camera on the 4S is 8 megapixels. I’ll be taking my Pentax DSLR of course, but I’m sure I’ll be snapping lots of camera phone pix as well.
The 4S has several improvements over my old 3GS, but the one I was most interested in playing with was Siri. Siri (as I’m sure you know) the the new iPhone “voice assistant”. It’s how people like Zooey Deschanel can tell if it’s raining outside or not.
If Siri is your first introduction to voice recognition software, you might not appreciate just how far such systems have come. I first played with Dragon Dictate in the mid-1990s. Voice recognition programs of that era had to “learn” your voice by having you run through a setup reading exercise that took about an hour. Even after being trained, few of the early programs could keep up with speech spoken at a normal conversational rate. Like early OCR software, the first generation of home voice recognition software wasn’t all that accurate. Dragon Dictate claimed an 80% accuracy rate, and that was under perfect conditions. Background noise and cheap microphones could greatly reduce the software’s accuracy.
When rolling, I can type somewhere around 100 words per minute (give or take). That means I could type more quickly, and often more accurately, than those early programs.
I didn’t dabble with voice recognition again until I began experimenting with in-car navigation in the early 2000s. Several of the popular mapping programs supported voice commands for simple navigation assistance, such as zooming in or setting destination points. Unfortunately, road noise combined with the distance between your mouth and your computer made for less than stellar results.
My first smart phone, the Palm Treo, had software that allowed for voice dialing. “CALL. HOME.” (pause) “Dialing Pizza Hut …” Argh.
So anyway, about Siri. It’s amazing how far the technology has come. For starters, Apple’s noise-cancelling technology does a pretty good job on removing background noise, improving the accuracy rate of the speech recognition. It doesn’t work with the radio cranked to 10, but with it on 3, in addition to background road noise, the phone had no problem recognizing my commands.
Another improvement is Siri’s ability to decipher what you really want. In the past, voice commands had to be specific. With Siri, it recognizes what you want. “call Susan O’Hara mobile” and “call my wife’s cell phone” both do the same thing.
And that’s the beauty of the thing; suddenly, I can do things with my phone that I either didn’t know how to do or was too lazy to do. “Siri, wake me up in 20 minutes,” results in Siri setting an alarm 20 minutes from now. “Siri, where’s the nearest Taco Bell?” gives me that exact information. Suddenly, I don’t have to know HOW to do those things. The phone does them for me.
Back in the pre-Windows days, you couldn’t launch a document — you launched applications, and then opened documents. It seems like Windows 3.1 even worked like this, but I could be wrong. If you wanted to read a Word DOC file, you couldn’t just launch the DOC file — you launched Word, and then opened the DOC file within Word. I remember thinking how amazing Windows 95 was, that it would allow you to just click on a file and it would magically open the right application and launch my file.
And that’s how Siri seems to me. I don’t have to know how to edit my contacts, or use the map program, or how to schedule alarms. I just tell Siri what I want to happen, and it does it. It’s a single point-of-contact for controlling my phone (at least, the apps that are supported).
For things that Siri can’t do on the phone, it uses the internet. You may have seen the iPhone commercial in which Samuel Jackson asks Siri, “How many cups are in a quart?” and she gives him the answer. I’ve already played around by asking her what movies are showing near me and how to make a screwdriver, and she delivered.
Perhaps the most interesting thing to me is that Siri is able to combine bits of information. “Remind me when I get home to reboot my server,” I said on Thursday. Siri pulled my home address from my contacts (I guess) and popped up a reminder when I pulled in the driveway.
What I’d like to see is this technology ported over into different gadgets. With PVR systems we can already do things like record television shows, play them back, and search the web, but imagine being able to do all of those things using plain language! “Siri, send me a text message when a new episode of Mythbusters airs.” “Siri, record the “Soup Nazi” episode of Seinfeld the next time it’s on.” “Siri, always keep five recordings of Beakman’s World.”
“Siri, it’s time for me to get off the computer.”
I have never purchased an Apple product in my life. But as soon as the iPhone 5 is released, I’m on board. I hope there’s an option with a male voice, though. It will make it much more fun for me.
“Siri, call me an ambulance!”
“From now on, I’ll call you ‘An Ambulance.’ OK?”
I want that app to answer real world problems, though, before I get on board.
“Siri, why is that douche in front of me driving like an asshole?”
or better yet…
“Siri, please tell that douche in front of me that he’s driving like an asshole.”
Honestly, though, when it gets to that point I don’t know whether I,ll be happy or afraid of the tech.