In the world of user experience, we use a fancy term: cognitive load.
This basically means how much mental effort we need to employ to successfully complete a task or respond to a request.
When we talk about cognitive load and UX, our main concern is making sure we don’t overwhelm the user with information or options, and that we’re using the most efficient path to reach a specific user goal.
We want to avoid a case of cognitive overload i.e. ‘brain does not compute.’
When building graphical user interfaces (GUIs), good designers try to reduce cognitive load by making user paths short and intuitive. They adhere to standard behavior to make this the case – i.e the user is experiencing recognition (I can guess what to do here) rather than recall (I’ve done this before).
Things are a bit different when it comes to voice. Voice user interfaces (VUIs) pose a different set of challenges with respect to cognitive load and many of the existing methods that apply to traditional user interface design are no longer applicable.
So, let’s take a look at how voice designers can be aware of cognitive load and avoid overwhelming the user.
First: Understand peaks of attention
In a normal app or website, the user interacts with the GUI in a constant way; this means that the rate of input and feedback between the system and the user is kept at a fairly consistent rhythm of cognitive load.
Even if the user is not actually performing a task (such as clicking a button) she is reading texts, navigating the interface mentally, and so on. Thus, the level of attention required to operate the interface remains more or less consistent throughout.
The pattern of attention when it comes to VUIs, however, is different. Users make the initial command unprompted, using the activation phrase, and a response from the system is returned immediately.
The transient nature of voice means that the user will have to be fully alert at this moment to receive and understand the system response.
They can not ignore it as easily as reflexively clicking away a dialog box on a screen without fully reading it (a common practice which we’ve all done on many occasions), nor can the user take their own time to read, re-read, and understand the response.
Instead, the user is forced to be fully attentive of the VUI during the interaction, and does not need to be attentive at all when no direct interaction is taking place.
We can think of these cognitive load patterns as peaks of attention–and designers should bear them in mind when designing VUIs.
How to handle cognitive load highs and lows
For VUIs, the timeframe within which the interaction takes place, and the required cognitive load, is compressed. During the interaction, attention is high as the user can not control the speed of information flow. Consequently, cognitive load is also high which can lead to a poor experience and increased user errors.
As these peaks have a high cognitive load, they should be kept short to stop the user from getting overwhelmed. This means limiting the length and information density of responses.
Similarly, having many peaks in succession, even if they are all related to the same task, may also cause a problem for the user. Designers should think carefully about how they request information from the user, be aware of the Peaks of Attention that they are creating, and the cognitive load associated with it.
With this concept of attention peaks in mind, designers should consider carefully both the duration and information density of responses.
Very often, we try to take design patterns that work well in GUIs and simply repurpose them for VUIs. This is usually not the best solution. Take, for instance, a list of different possible options that the user can select. Most VUIs simply read the entire list, regardless of how many options it contains.
Here’s an example:
User: “Give me a recipe for Margaritas”
System: “Please choose which type of Margarita you would like a recipe for: Frozen Margaritas, Strawberry Margaritas, Vegan Margaritas, Watermelon Margaritas, Alcohol-Free Margaritas…”
It is not difficult to see that this is an unsuitable design pattern for VUIs. The Peak of Attention in this case would be too long and too dense. By the time the system has recited a list of 10 or 15 varieties of margaritas, it is unlikely that the user will be able to recall the first or second option, as their attention was focused on absorbing the information later in the list.
A good rule of thumb would be to limit lists to a maximum of 3 to 5 options per interaction. If the list is longer, then it may be useful to split the list by ending it and asking something like “would you like to hear more possibilities?” This would lower the information density, but would require additional peaks if the user has not heard a possibility that solves their problem. In short, not a great solution. The cognitive load is too high.
Clearly, this pattern is far from ideal in its auditory form, and is a great example of how speech should really be combined with other interfaces to create a better experience.
One thing that GUIs can do very well is to display longs lists of information and allow users to browse them.
So rather than trying to squeeze an unsuitable design pattern into a VUI, this is an instance where designers should recognize the strength of combining VUI with GUI and allowing them to do what each one does best.
For instance, the request for Margarita recipes could be made using voice, and the initial response for top results could be returned with voice, but the rest of the results could be shown on a companion screen.
UX designers have to start thinking across interface paradigms and use the potential of each one to create the best overall experience.
Use Grice’s Maxim of Quantity
Another principle that can be derived from this idea of Peaks of Attention is for the VUI to get to the point quickly – and then get out of the way.
Superfluous information in VUIs is much more expensive, in terms of cognitive load, than it is in GUIs. Therefore designers have to be much more ruthless about what they exclude from an interaction. Every conversational turn should delight the user, rather the frustrating them.
However, the danger here is also to go too far in the other direction by making responses so sparse that they no longer give sufficient information or they sound cold and mechanical to the user.
A good rule of thumb to find the balance is to look at Grice’s Maxims which are a set of rules that make communication as effective as possible.
The 4 maxims are: the maxim of quantity, the maxim of quality, the maxim of relation, and the maxim of manner.
All of these maxims together form the Cooperative Principle and are important when designing for voice, but when we talk about cognitive load, the most important is the maxim of quantity which states that you should give as much information as is needed and no more – while being as informative as possible.
For instance, consider these responses to the question “What time is it?”:
Example 1
System: “It’s 10:51 AM”
Example 2
System: “It’s 10:51 and 35 seconds AM, Central European Time, on the 10th August 2017”
Example 3
System: “It’s morning”
Obviously, only Example 1 is adhering to Grice’s Maxim of quantity and thereby striking the right balance between useful information and cognitive load. While it is beyond the scope of this article to explore all of Grice’s Maxims and the Cooperative Principle in detail, it is something that VUI designers should familiarize themselves with.
Conclusion
Cognitive load patterns are different for VUIs and we can not just copy the best practices that we have for GUIs and apply them like-for-like.
By thinking about the user’s peaks of attention, and by followings Grice’s Maxims, voice experience designers can reduce cognitive load and take the first steps in creating a delightful experience for users.
If you’d like to learn more about designing for voice user interfaces, check out these articles: