Your Guide To How To Design Voice User Interfaces

Unless you’ve been living off the grid for a few years, you’ve probably noticed that voice devices like Amazon’s Echo and Echo Dot are turning voice interactions from science fiction into reality.

Many experts are predicting that voice will completely revolutionize the way we interact with computers in the next decade. Could they be right?

If you want to familiarize yourself with the basics, read on:

Why care about VUI design?
What we mean by voice user interfaces
The difference between voice-only interactions and ‘multimodal’ ones
How to design for voice interfaces vs graphical interfaces
The industries that will be transformed by voice
How to get involved in voice user interface design

First things first…

Why care about VUI design?

The number of skills available for the Amazon Alexa devices has grown exponentially. Tech giants like Google, IBM, Cisco and even Slack are all investing millions in voice technology.

Will voice technology eventually replace screens? That’s yet to be seen. But why is it suddenly taking off? You might say that the stars have aligned! Here’s how:

1. The technology. Artificial intelligence is reaching a whole new level (so much so, certain thought-leaders are panicking about a robot takeover!) Due to AI and cloud computing, machines are able to instantaneously understand many variations on human speech far more accurately than ever before.

2. The user. Speech is such an intuitive way of communicating. Of all of the ways to interact, speech probably has the least friction. Except for mind-reading, but we’re not there just yet.

3. Companies. And this will be a big driver. Voice presents a huge opportunity to build a positive rapport with users and provide an enjoyable user experience that keeps customers coming back. Already we’re seeing the importance of system personas having a likeable personality.

So why should you care?

Whether you’re working in UX design, or simply interested in user experience, voice is set to be a game-changer. Over the coming decade, there’ll be thousands of jobs opening for voice design specialists, and at the moment, very few people have these skills.

That’s why we collaborated with Amazon Alexa to create an 8-week specialization course in voice user interface design, launching in September 2017. Designers who take the course will have a head start in the field – not to mention loads of job flexibility and opportunities.

Learn more about VUI here: What does the future of voice technology hold? A look at VUIs

What we mean by voice user interfaces

There are a few different terms flying around – voice services, voice assistants, voice-first devices. The principle for all is the same.

With a typical user interface, e.g. a graphical one, we use a keyboard, mouse, or touchscreen. With voice user interfaces, the system understands voice commands, and responds either by speaking back, or by showing a visual response. Which leads me to an important distinction:

The difference between voice-only interactions and ‘multimodal’ ones

A nice example of a multimodal interface would be a voice controlled TV. We’re eliminating the need to use our hands (yay!) but we’re ultimately seeing the results of our voice commands on a screen. This means more information can be conveyed to the user than on a voice only device. In voice only devices, it’s important to take into account cognitive overload and carefully manage the quantity and speed of information delivery.

Imagine you want a recipe for pancakes.

A voice-only device might simply take the first result, and read it out to you at a reasonable pace: To make approximately 15 pancakes, you will need 6 eggs….etc.

A multimodal device could display several results on your device’s screen, and you could say, “Open the third option” or “Open the BBC Good Food recipe.”

So while you’re controlling the device with voice, you’re seeing the results on a screen.

Multimodal interfaces could help drive huge advances in workplace wellbeing, e.g. avoiding repetitive straining injuries from typing or operating a mouse. However, imagine an open plan office where everyone’s talking, “Open Google Drive, Open folder Blog Content 2017, Open Designing for Voice Guide, Open dictation app and begin typing: ‘Unless you’ve been living off the grid….’”

So, while it’s important for designers to consider how voice services can respond to users with voice only – it’s also important to keep multimodal interfaces in mind too.

We’re still not entirely sure which one will become dominant in future, but we know at the moment multimodal interactions are very popular on mobile phones. Already over half of US teens use voice search on a daily basis, according to a study conducted by Northstar Research.

How to design for voice vs screens

Now for the challenging part. How do we adapt human-computer interaction to voice?

As I mentioned before, one of the drivers of voice interactions will be companies. Tech giants are already realizing that they can reach consumers on a much more personal level, add personality to their brand voice, and boost their brand loyalty.

And companies will rely on talented UX designers to design interactions that are enjoyable and logical. Many of the same principles of UX design apply – persona research, vigorous testing with real users, continuous iteration, etc.

However, there are also some big differences to take into account from the outset. Let’s have a look at some of the deal-breakers.

1. Analyse typing vs talking styles

The way we search online is very succinct. Up until now, the assumption is that when we interact with voice-controlled devices we will employ natural speech patterns, e.g. full sentences or questions.

Online we might search for ‘Elvis Presley date of birth” whereas with voice we might ask a device, “Alexa, when was Elvis Presley born?”

But who knows how this will evolve. Just as the internet began with ‘Ask Jeeves’ and before we learned to search in shorthand, we might similarly end up talking to our devices in a curt, direct manner (once we realize they have no feelings!)

Voice services will have to understand a range of different commands (not to mention accents, tones, intonation etc!)

2. Find non-disruptive ways to show system status

Voice interactions will in many ways have more in common with human conversation than human-computer interaction as we know it.

How do people show each other that they’re alert during a conversation?

There are several established norms we use to show we’re switched on – nodding our heads, smiling, repeating things or asking relevant questions…

Similarly, an Amazon Alexa device such as the Echo Dot will show it’s switched on and paying attention using flashing lights, subtle sound effects, and of course by responding with something contextually accurate – most of the time.

3. Make your system personality your unique selling point

Over the past year, voice operated consumer devices have shot into the limelight and one thing is clear: Users love to figure out if their artificially intelligent voice services have a sense of humor.

The more charming voice services are, the more users will be compelled to use them – and that doesn’t mean simply meeting a user’s needs. Yes – the race is on to develop the smartest voice services, but companies are also battling it out to create the most likeable voice personalities.

“Alexa, will you marry me?”

In the first half of 2017 alone, Amazon Alexa received over 250,000 marriage proposals, to which she has given many responses including, ‘I like our relationship as it is now’ or ‘let’s just be friends.’

When users interact with your services, they probably won’t follow a click-through funnel like they might on a website.

You should enable users to go directly to what they need, and personalization will play a big role in this.

For example a user who wants to order from Amazon.com will not say, “Alexa, visit the Amazon.com homepage, then go to my account, then view my history, then find coffee, then place the order again.” they’ll simply go straight to the final step: ‘Alexa, reorder coffee.’

5. Make sure users can use recognition rather than recall

As voice expert Daniel Westerlund mentions in this post on cognitive load, voice interaction should come naturally.

A voice device user does not necessarily want to learn thousands of different commands (recall), they want to say whatever naturally comes to mind, i.e. they automatically know how to interact with the device (recognition).

For this reason, designers will have to ensure that machines understand many variations of speech – different ways of saying the same thing.

Luckily, machine learning and AI has made a huge difference here already.

6. Think carefully about use cases

Voice is great—it’s quick, it’s convenient, it’s natural and it’s potentially very friendly too. However, it’s not alway appropriate.

Imagine entering passwords or pin numbers through voice – in a public place this would be considered the pinnacle of stupidity. People could steal your personal data simply by listening in.

While you’d hope users are smart enough to know what’s appropriate, it’s good for designers to keep this in mind.

Which industries will voice impact the most?

Now we know more about designing for voice, let’s delve in further and look at the areas where voice interactions can really have the biggest impact – improving both user experience and helping companies to cement customer loyalty with seamless interactions.

1. The automobile industry

It’s not necessarily immediately apparent, but when you think about it, a car is possibly the single best use case for voice interaction.

Here’s why:

You shouldn’t operate other devices with your hands while driving
You shouldn’t take your eyes off the road and look at graphical interfaces while driving
You may spend long periods uninterrupted driving (and struggle to multitask while doing so, despite your best efforts)
You have complete privacy – no-one outside the car will overhear you (just think of how often people sing at the top of their voice when driving)
Consumers are already used to voice assistants in cars, in the form of talking sat navs.

Expanding the services available, and enabling voice command will be a natural progression, e.g. ‘Read me my emails,’ ‘Find a motel nearby with good reviews,’ or ‘Is there an ATM on the way?’

I’m sure we all know someone who’s tried to Google things like this at the wheel, and improved voice will not only mean better safety but also better user experience.

Car manufacturers who incorporate the sexiest in-car voice assistants will have a clear advantage when marketing their latest models.

2. Wearable electronics

Fitness devices like the Fitbit have taken the electronics industry by storm since the first Fitbit was launched in 2009. The company now generates around $2 billion in annual revenue. However, many wearable electronics rely on an operating system and a smartphone to actually access information.

In future, wearables that interact entirely through voice would cut out this intermediary and enable devices to be even smaller and perform a wide range of functions.

3. Customer service

Who knows – maybe call centers will become obsolete, as machines become highly competent and can respond to a wide range of requests.

In the near future at least, it’s likely that straightforward interactions will be completed by artificially-intelligent bots – freeing up human time to deal with more complex cases.

However, designers will definitely have their work cut out creating voice personalities that are delightful to interact with. Especially in customer service roles where customers may be irate, e.g. if a package hasn’t arrived on time.

Imagine an artificially intelligent customer service agent with the ability to rate someone’s anger on a scale of 1 to 10 and determine their consolatory offer accordingly.

It will be really important that these bots are also able to understand a wide range of inputs, as speaking to a robot that doesn’t understand would only annoy an irate customer further! Speaking to an absolute legend on the other hand…

4. Devices for the visually impaired

The rapid advances in voice UI that benefit the whole population will also carry huge improvements in voice services for those who already rely on them – people with visual impairments.

Previously, things like screen-readers have enabled visually-impaired people to use the web, but the experience has limitations and some websites are more accessible than others. Consider how you would operate a touchscreen device without being able to see it.

Slowly, the wealth of information online will be available through more meaningful voice interactions that have been custom-designed for voice.

5. Real-time interpretation

Real time interpretation has long been a fascinatingly impressive feat of the human brain, one that technology has yet to match.

The role of language interpreters is an important one, especially in areas like international politics or business – where it’s important that world leaders or business people can understand each other, including nuanced messages, and negotiate in real-time.

Now, because of the advanced ability of machines to understand human speech, digital real-time translators are becoming a feasible alternative to interpreters. It will be interesting to see if these pop up at United Nations or if having human interpreters at international events becomes an age-old tradition despite high-tech alternatives. Who knows – maybe the interpreters are the ones with all the tact, keeping the peace.

Do you want to get involved in voice user interface design?

As mentioned at the start of the post, a huge driving force for advances in voice will be those companies who have a lot to gain from developing the leading voice bots.

However, the key to success will be UX designers. Rather than taking a technology-first approach, companies should be user-focused; assessing the use cases and seeing if contextually it makes sense for users to interact with your product or service using voice. Checking that it’s actually an improvement on the options available before, like the elevator example. Has anyone ever complained about having to press a button in an elevator?

You can see some of the big industries where voice makes sense in our list above, but this is just the tip of the iceberg – as the technology evolves, the recommended use cases will multiply too.

For companies to succeed, they will need to hire lots of enthusiastic designers with a knack for solving problems! Does this sound like you?

Want to learn how to design Alexa skills and do all the other amazing things that voice designers do? Here’s how to get started:

Have a look at our course Voice User Interface Design and become the go-to specialist for voice.
Contact companies who are looking for voice designers in your area (e.g. try searching for ‘voice’ on Indeed.com). As there are very few specialists in this field, many companies are open to hiring UX designers without a background in voice for voice design positions.
If you’re an entrepreneur—look out for funds and startup accelerators that are investing in voice now!

Learn online, not alone

Individualized mentorship

Job Guarantee

Personal career coaching

Graduate outcomes

Looking for tech talent?