We examine how voice chatbots are designed and how they should work
Let’s say you are sitting in a restaurant and having a conversation with a friend. Your friend asks, “Did you watch the new Black Panther movie?” “Yes”, you reply. “How was it?” He asks you next. You say, “I am sorry, I don’t understand. It doesn’t matter how many times he asks you the question, you never answer it. We experience the same level of frustration with many voice user interface systems today. Despite the modern developments in speech recognition technology, our road to success in simulating human conversation is long.
Let’s see a real-life case from “Ok Google”, demonstrating two conversational turns ( a turn is a single interaction between the user and the machine).
USER Ok, Google. When’s my next appointment? GOOGLE You have a calendar entry tomorrow. The title is “Chinatown field trip.” USER Ok Google. Can you please repeat that? GOOGLE
Google has messed up its end of the conversation. It’s like the first part didn’t occur. The term conversation design is more commonly used nowadays, but it is frequently used improperly. A lot of people assume that a conversation design is something whenever you communicate with a machine through voice or text. Whereas, a lot of these “conversations” is just only single turn, for example, asking someone where the nearest KFC is located.
Conversational design means having an interaction with a system beyond one turn. People rarely have interactions that last only one turn. So you have to design a system that goes beyond one turn, basically, think and plan what users might want to do next. Make sure that you don’t push the user to take another turn, but anticipate and allow it.
Also, it is important to remember the recent conversation history. Chatting with a system that can’t recall anything beyond the previous interaction makes for an unpleasant experience. While designing a VUI, most people only contemplate tasks that happen only once, such as answering a search query, setting up a calendar appointment, playing a song, placing a phone call, and so on.
Every so often these tasks can be done in just one fell swoop, but the best VUI designs consider what will happen next. Let’s see an example in which Google does an awesome job of remembering what happened in previous conversational turns:
USER Ok, Google. Who was the 16th President of the United States? GOOGLE Abraham Lincoln was the 16th president of the United States. USER How old was he when he died? GOOGLE Abraham Lincoln died at the age of 56. USER Where was he born? GOOGLE Hodgenville, KY USER What is the best restaurant there? GOOGLE Here is Paula’s Hot Biscuit:
Setting User Expectations
An excellent conversational design isn’t about designing excellent prompts. Like Margaret Urban, interaction designer at Google, recommends: don’t ask a question if you can’t understand the answer.
She illustrates an example of a prompt that pops up after the user has completed writing an email: “ Do you want to send it or edit it further?” One response, which you might not have thought initially, will be “yes” – to develop a response in your system to handle it. If you are getting a lot of “yes” responses, you may have to paraphrase the prompt to make it more clear, such as “what would you like to do – send it or edit it?”
Margaret accentuates the importance to establish the user expectations early on. How does your app introduce voice? You can offer a “tour” to first-time users and offer educational points along the way. As she says:
When someone has completed a VUI interaction, it’s a bit of an endorphin boost—the user has a glow of completion and satisfaction. It’s a nice time to educate people—“Since you were great at that, how about trying this?”
Be careful about telling users tasks were successful. Margaret says “Setting the alarm,” for example, implies to the user the alarm has been set, whereas the engineer may argue that the task hasn’t necessarily been completed yet and should have an additional prompt that says ‘Alarm set successfully.’”
The Amazon Echo has the following dialog when setting a timer:
USER Alexa, set a timer for 10 minutes. ALEXA Setting a timer for 10 minutes.
Imagine the conversation with an additional confirmation:
USER Alexa, set a timer for 10 minutes. ALEXA Setting a timer for 10 minutes. ALEXA Okay, timer successfully set.
It’s unnecessary verbiage. If in fact the time did at some point fail to be set, then it would be good to alert the user—but that’s the exception.
Margaret offers a good analogy about designing with breadth. Perhaps you’ve designed a system that allows people to set an alarm—but not give them a way to cancel it. She likens this to giving someone a towel for a shower, but no soap.
Discoverability is another important element of design. How does your user know what and when they can speak? I discovered my Android camera app was voice-enabled purely by accident—while taking a picture one day, I naturally said “smile!” and the photo clicked. I quickly discovered I could also say “1.. 2.. 3!” and “say cheese!” and it would also take a photo. This is a great example of piggybacking off of user’s natural speech.
When asking the user for information, it’s often better to give examples than instructions. If you’re asking for date of birth, for example, rather than say “Please tell me your date of birth, with the month, day, and year,” use an example: “Please tell me your date of birth, such as July 22, 1972.” It’s much easier for users to copy an example with their information than translate the more generic instruction.