COMPUTER SPEECH INTERPHASE About This describes a near future way of interacting with our smart phones in which we speak to our devices, they speak back to us, and they communicate with us with sounds. The Need We are interacting with our smart phones with increasing frequency. But right now, we largely have to press virtual buttons and look at the screen in order to interact with them. Especially in a driving situation, this is not ideal. And in general, it is inconvenient to type significant quantities of text and go through multiple layers in order to get done what we want. What we need is the Star Trek method of interacting with our smart phones. That is, by speaking to them and having them speak back to us. There are some caveats though. Tonto Back in 1994, I wrote a computer program which I called "Tonto". It was a program where one asks a question (text entry) using natural language. It threw out extraneous words and synonymized words to standard terms. So, as a physician, if I were to ask, "What are the different causes of stomach pain?" it would transform that to Abdominal, Causes, Pain. Any question boiling down to those three words would then connect to a written response -- "The causes of abdominal pain are..." I then wrote another program which could perform any number of functions starting with a single word or phrase. This is basically the global search feature now found on smart phones. Then, I did a summer research project in medical school which included working with a very basic speech recognition system in which different phrases would cause different programs to be launched. AskJeeves.com had a search engine in which you could use natural language using a process I am guessing was like Tonto. Putting these together, plus contextualized computing, and you now get Siri which I believe represents a pivitol point in how we are going to interface with our devices. Siri Right now (October 2011) the iPhone 5 is being launched with several great features. The one that stands out to me is Siri which started out as a free standing app until Apple purchased it and is improving it considerably. The YouTube videos of what Siri can do are impressive, especially the corporate (on stage) presentation. Imagining the Future Imagine driving along and you get a phone call. Instead of having to handle the phone call, you can verbally ask your cell phone who it is. After being told, you can instruct your phone to accept the call and put it over the speakers, take a message, ask for a voice-to-text subject line (topic of conversation) or message transcription, or decline the call -- all without ever taking your hands off the wheel. Imagine asking your device a question and your computer assitant searches the internet and then reads back to you an explicitly developed message or a likely answer based upon it's search results. Imagine speaking to your device to place a food order, order a room, order a plane ticket, hail a taxi, get office hours for a company, get directions, respond to text messages, listen to your e-mail, or any number of things. You are practically limited by your imagination. The Downside However, there could be a problem with all of this. Imagine all sorts of people talking to their phones and their phones talking back. Couldn't this get annoying? Yes it could. How could this be reduced. For one, check out my discussion of contextualized computing, a closely related concept. In that paper I described how our smart phones could understand our context and respond accordingly. Are you traveling more than 15 mph? Then your smart phone should probably interact in a different way with you than if you are quietly sitting at home. Sounds What if you are in the board room of your company, there are a lot of other people in the room, but only your boss has been speaking for the last five minutes. Having your smart phone talking to you would probably not be acceptable. In a situation like this, your smart phone should probably automatically take a voice-to-text transcription of an incoming call, have the person hold the line, and then make a subtle, unique click sound that notifies you of the incoming call. It could tell that you have picked up the phone and have looked at the message. You could then reply with a text message which would then be transformed into a voice message to the person who is holding the line. Each person could have a variety of sounds which are unique to them (sort of like ring tones) which helps them know a number of different things beyond the sounds that we currently have. For example, while on vacation, our smart phones could give us a sound when we are about to pass some area of interest and our device could tell us interesting information about it. Again, the potential applications are huge.