Page tree
Skip to end of metadata
Go to start of metadata

By Amir Rozenberg

Audio Set to Optimize User Interfaces and Drive Business Transformation

New Artificial Intelligence (AI)-driven user interfaces are the most recent trend in digital transformation. For end users, voice-driven chatbots (as example) represent a streamlined interface to accomplish a task in a fun, engaging, informative manner "on my time and my device". Over 50% of users would rather interact with a brand over messaging rather than over the phone. Not surprisingly, serious companies are leveraging it to provide new, innovative, customized, and streamlined services to their audience, while expanding their learning about the end user. 



With Optimized User Engagement Flows, Expectations Are Growing

With these streamlined ways to engage users, which they prefer, user expectations rise: both for accuracy and correctness of the transaction, as well as responsiveness. In theory, the technology is there: backed by strong cloud services, artificial intelligence algorithms and strong voice recognition engines. But one should recognize this is a tremendous expansion to the user interface, one that should be handled carefully and with plenty of testing, as some are learning and as shown is this humorous clip:



More Than Just Chatbots

Audio interfaces are not limited to voice-oriented user interfaces. There are many other functions that require the attention from a quality perspective. For example:

  • Media streaming from services such as Pandora, Spotify, Audible, radio station applications etc.
  • For telcos: audio quality testing of voice/VOIP calls, continuous monitoring of user access to E911 and similar services, visual voicemail etc.
  • and more

Test Automation To The Rescue

Many brands in various stages of enhancing their applications to offer voice-based interaction are facing the challenge of increased coverage (test cases and platforms) in ever-accelerating release cycle: there is simply not enough time to cover more test cases. This is where test automation comes in: the ability to develop a dictionary matching possible user scenarios interacting with the virtual assistant. When thinking about the various test To accomplish automated testing one would require several functions:

  1. The ability to convert a text string to audio
  2. Inject audio onto the virtual assistant on the device
  3. Validate the response: both for correctness as well as responsiveness
  4. Record and transact with the response: Possibly record the outbound audio and feed it into a speech recognition algorithm, or a song matching one etc.

Putting It Together

Let's take an example of a chatbot scenario and describe how to build a scalable and robust test suite to provide the needed coverage through test automation. 

Here are the steps one would take to fulfill the test requirements:

  1. Create in your script a dictionary of user requests and response validations. For the most part, these would be a set of strings (see comment at the end)
  2. Use the text-to-speech function and inject the resulting audio file into the device
  3. For validation, there are a number of options, as typical interfaces would have both audio and textual (on-screen) response
    1. Conduct native object or visual validation on the response to validate correctness and responsiveness of the outcome
    2. Record the audio and use the speech-to-text to validate the vocal response

Let's cover some variations on the basic scenario:

  • Voice variations: every chatbot needs to consider accents, stuttering, "Um" and "ah" etc. For that purpose, one can record audio files and use audio inject in order to introduce those into the script
  • Audio quality (for media applications): Another function provided is audio quality, where the audio is recorded and fed into an automated algorithm. Two options are offered, with or without reference. The former could be used to compare a pre recorded song (as example) vs. one that is being played live. A particularly interesting scenario to test is validate streaming quality in the presence of sub-optimal network conditions via Wind Tunnel. The latter (reference-free audio quality) can be applied to live streams where one cannot anticipate the song that will be played, as example.
  • Voice call quality and Visual voicemail: create a phone call between two devices, inject audio and measure the quality on the other end. For visual voicemail, hang the call and validate the text that shows in the voicemail.
  • IVR and 911 monitoring: create a phone call and interact with/validate the remote end via audio inject and audio record/speech-to-text

 

Let's Go!

We believe the market adoption for rich and streamlined user interface is going to be tremendous. Testing, particularly automated, is a mandatory element that needs to be planned as part of creating and deploying such services. Hopefully with Perfecto it is possible to accomplish this task.

 

Read More