The Audio Voice 397: What to Consider for the Development of Products Using Voice as a Primary Human Interface?

Weekly Newsletter for the Audio Industry and Audio Product Developers

Product Design | Audio Electronics | Acoustics | DIY | Audio Innovations

Industry & Product News

Upgrade Your True Wireless Earbud Design. Why Buy Four Chips if You Only Need One?

Azoteq, known for its 4-in-1 ICs, recently expanded its offering to the TWS market with the release of the IQS323. The IQS323 offers capacitive and inductive sensing as user interface and in-ear-detection, while the IQS7222A IC is a multifunctional capacitive, inductive, and Hall-effect sensor with additional temperature sensing for reference tracking. The IQS7222A is offered in a 1.62 x 1.62mm WLCSP and allows you to add a touch-and-swipe interface, wear detection, docking detection, and your choice of an inductive waterproof button or the inductive force-sensing features. Learn More Here

New COMSOL Multiphysics Version 6.1 Reinforces Transducer Design for Audio Products

COMSOL announced the release of the latest version of its modeling and simulation software -COMSOL Multiphysics version 6.1, including expanded capabilities in Fluid and Mechanical Simulations. The Acoustics Module in version 6.1 includes several important new possibilities for electroacoustic simulations, including acoustic streaming, and lumped speaker boundary conditions for thermoviscous acoustics. Read More

Boosting Audio Processing with High-Performance Machine Learning - An Audio Product Education Institute Webinar, November 9

The Audio Product Education Institute (APEI), an initiative of the Audio Engineering Society (AES) is promoting a webinar on the implementation of Machine Learning solutions using DSP Concept's Audio Weaver in the new and exciting Alif Ensemble MCUs, running ARM's ML accelerated Ethos U55 architecture. This session, presented by APEI's AI and ML education pillar, will offer a high-level overview of the exciting new audio processing possibilities leveraging these powerful new scalable microcontrollers, uniting highly integrated embedded processors with AI acceleration. Read More

Solid State Logic Launches SSL CONNEX Multipurpose USB Microphone

Solid State Logic (SSL) has made a rather surprising announcement and entered the microphone space, with a new USB microphone design that can be used for conferencing, live streaming, and even music recording. A versatile solution for everyday digital life, as the famous British company describes the SSL CONNEX multipurpose USB microphone, the design supports four tailored user modes and immersive recording capabilities, introducing also never seen before features. Read More

M&K Sound Debuts Slimline D Series Home Loudspeakers

Danish company Miller & Kreisel announced a new, high performance range of slim, lifestyle-friendly loudspeaker series designed for today's living spaces. The new M&K Sound

D Series launches with the new D85 and D95 models, designed to sound superior in any stereo home audio configuration, and also bridge the gap between the brand's dedicated Home Cinema speakers and the entry-level M Series. Read More

Audio-Technica Debuts ATH-WB2022 High-End Wireless Headphones

In celebration of its 60th anniversary as one of the world's foremost manufacturers of high-fidelity audio products, Audio-Technica announced the introduction of its ATH-WB2022 wireless wooden headphones. The limited-edition ATH-WB2022, crafted from layered woods, is touted as the finest Bluetooth headphone ever created by Audio-Technica. The ATH-WB2022 supports up to 24-bit/96kHz high-resolution music listening, and features the world’s first completely balanced audio output system for wireless headphones. Read More

Audio Developers Come Together for ADC22, November 14-16, 2022

The Audio Developer Conference (ADC) is an annual developer community event celebrating all audio technologies from music applications and game audio to audio processing and embedded systems. The producers of the 8th Annual ADC22 announced that the in-person portion of their hybrid event has sold out due to high demand from the audio developer community. Yet, tickets are still available to attend this one-of-a-kind online experience using the on Gather.town virtual venue platform from November 14-16, 2022 at www.audio.dev. Read More

IsoAcoustics Introduces V120 Mount Isolation and Aiming Solution for Immersive Audio

Following a first presentation at the NAMM Show in June 2022, IsoAcoustics has announced the official release of the new V120 Mount for wall and ceiling-mounted speakers in professional and home project studios. This is a much-needed, and extremely useful solution for immersive audio installations in studios or any type of listening room, based on established IsoAcoustics' patented isolation technology, combined with a range of mounting and aiming accessories. Read More

Guest Editorial

Ido Gus

(CEVA)

Voice Control for Low Power Microcontrollers

What to Consider for Product Development

In this article, we will discuss the why and how of voice control deployment on low power and resource constrained microcontroller-units (MCUs) and its translation into real-world applications.

But first, let’s define a couple of core concepts, including Human Computer Interface (HCI), Voice User Interface, and Voice Control:

- Human Computer Interface (HCI) is a well-defined concept that can be described as the point of communication between a human user and a computer. The communication channel classification can be based on many of the human senses: vision, hearing, touch, and so on.

- Voice User Interface (VUI) makes it possible for humans to communicate with machines using voice. Machines may employ some form of speech recognition to translate human speech to commands and queries.

- Voice Control is an implementation of a VUI, allowing a human to use simple, concise commands to operate a device or appliance.

VUIs have been around for a couple of years and have been made very popular over recent years thanks to devices such as Amazon Echo, Google Home, Apple Home Pod, and their associated voice assistants also deployed on smartphones, TVs, cars, and other devices. Most of these devices rely on complex, cloud-based, speech recognition engines. These engines handle complex human speech, allowing users to use natural language for interaction with machines.

However, these abilities come with a (manyfold) price tag, starting with compromised user privacy, as user queries are uploaded to the cloud for processing, and are stored there for various lengths of time (from hours to months, depending on the service supplier). Also, the device must have a connection to the cloud to operate, and processing on the cloud is often energy consuming and slower, which in turn makes the device’s BOM costs soar as relatively complex connectivity hardware must be integrated into the device, often resulting in major design modifications.

The price tags of full-fledged, cloud-based voice assistants can be alleviated, for many use cases, by deploying a small, task optimized voice control engine on a battery-operated, resource constrained, offline, MCU-enabled device. Voice control powered by a small dedicated VUI engine can be realized on a simple MCU-based hardware module serving as a drop-in replacement for existing controls (knobs, buttons, touch screens, etc.).

Naturally, there are limitations to the capabilities of such a solution, but as we will shortly see, for many tasks and use cases, these limitations are outshined by the benefits. The major limitation of voice control implementations for MCUs is that these are often characterized by a limited vocabulary support – only a small set of words can be recognized, words which the user must remember to operate the device properly. In other words, the user cannot use natural language, and instead must make their requests using the supported words and commands. For example, “play the next song” might not be recognized by a system configured to detect the command “next song” or even just “next”.

This limitation has a plus side – simplicity. Using short, concise commands, greatly reduces the risk of the device “misunderstanding” the command, due to ambient noise or other interruptions. This becomes very evident when considering the tasks voice control on MCU is designed to handle.

Let’s review some use-cases.

Major Appliances

Many major appliances that have button\knob\touch interfaces are also operated with dirty or wet hands (ovens, cooktops, washing machines, dishwashers). A Voice Control deployed on an MCU-powered hardware module can prove to be very useful in keeping the appliance clean and easily operable (have you ever tried to operate a touch interface with wet fingers?). From a manufacturing standpoint, voice control deployed on a mass-produced, MCU-powered hardware module can serve as a drop-in replacement for existing buttons, knobs, and touch interfaces with minimal integration costs.

Robot Vacuum Cleaners

Robot Vacuum Cleaners (RVCs) can operate independently or via remote controls (which always get lost…). An MCU Voice Control module supporting just a few commands (“clean kitchen”, “stop”, “go charge”) can significantly improve the user experience, with a small impact on BOM and costs, while preforming better than a cloud-based voice assistant, which often has difficulties with noisy environments and short commands.

Public Kiosks and Vending Machines

With Covid-19, Hygiene became a major concern, especially in the public domain. A MCU Voice Control module can provide an effective, low-cost option to upgrade existing machinery catering to public health. Supported commands can be displayed\printed on the device to alleviate the lack of support for natural language while lowering error rates.

Wearables, Hearables, and other Tiny Devices (TWS and Hearing Aids)

This device class is characterized by a limited power supply (small batteries, rendering continuous cloud connection impractical), limited compute resources (rendering large vocabulary speech recognition engines impractical) and limited surface space (rendering buttons and tap interfaces inconvenient) – which makes MCU power voice controlled an ideal solution.

IR Remote Control with Voice Control (for TVs, Home Entertainment, and HVAC systems)

Remote control is the preferred interface for operating TVs, home entertainment systems, A/C system, ceiling fans, and any device that is out of reach. Adding on-device VUI to remote controls allow better personalization (e.g., with speaker verification smart TV apps such as Netflix can be made to start up with user’s profile) and can also solve the “looking for the remote“ hassle. After-market universal voice-controlled remote controls can offer an easy upgrade for older systems.

What Makes Up a Good Voice Control Solution?

An MCU-powered Voice control solution must address some key challenges to be considered an efficient, effective and reliable alternative to existing interfaces (knobs, buttons, touch):

Quality of Service – the probability that the voice control engine will “understand” (detect correctly) the uttered command or word. Two types of errors exist – False Accept and False Reject. User sensitivity to each type of error may vary with use case and the voice control engine must be tuned accordingly. In general, users would expect a True Acceptance Rate of 95% or higher, and no more than 1 False Accept per 24 hours. In other words, VUI performance should be such that a user would not bother reach for the remote or button.

Noise Robustness – the ability to provide high-quality detection in noisy environments for all of the cases reviewed earlier operate in (some are source of the noise). A good VUI implementation is expected to have a perceivable performance degradation only at SNR levels lower than 5db.

Power and Compute Requirements – these are critical in determining if the candidate implementation is suitable for the use case. For battery-operated implementations, power consumption should be in the milliwatt range. Such a VUI implementation should be able to run on a Cortex-M0+ or similar MCU consuming less than 50MCPS and 80KB of memory.

Security – an MCU voice control solution may be expected/required to respond selectively to commands issues by specific entities. This can be realized by speaker verification technology that can be integrated into the system.

VUI for MCUs Implementation Challenges

Building a competitive VUI engine is a game of balancing multiple (and often opposing) constraints:

- Quality of service (True Acceptance Rate vs False Accepts per Hour)

- Robustness to noise

- Robustness to reverberation

- Extremely limited compute and memory resources

- Robustness to accents

- Data acquisition costs

In deep learning research, a common way to boost model performance involves increasing model complexity and the amount of training data. Such techniques are not applicable in the “real world” where the goal is building a model (VUI engine in this case) targeting MCUs that have very limited resources (model complexity must be kept to a bare minimum) in an economical fashion (data acquisition resources are limited).

The pressure set by the different constraints means that different model-size reduction techniques and advanced data engineering methods aimed at making the most of limited data acquisition resource need to be analyzed. Techniques such as post-processing quantization and quantization aware training, structured and unstructured pruning, low-rank approximation and sparsity and knowledge distillation can be deployed. While these techniques can reduce compute and memory footprints, model performance still has to consider:

- Multiple audio signal processing techniques

- Multiple feature extraction techniques

- Different model architectures from CNNs to RNNs and transformers

- A wide array of audio data engineering methods from effective and efficient data collection procedures to data augmentations and noise mixing parameters

Finally, when satisfactory model architecture, data acquisition, and training recipes are realized, a number of implementation challenges still need to be overcome:

- Code portability and maintainability

- High performance and high accuracy fixed point arithmetic

- Multi-platform optimizations

- API simplicity and usability

CEVA WhisPro is a Neural Network based speech recognition technology targeting the development of products using voice as a primary human interface. WhisPro extends CEVA's intelligent sound IP portfolio, offering developers a holistic solution for cloud-based or edge voice-controlled devices.

Conclusion

An effective VUI engine such as CEVA’s WhisPro voice control technology forms a key part of our ability to use voice as a primary human interface for intelligent cloud-based services and edge devices. Speech recognition models need to have a high recognition rate. Inherent AI technology should support a range of commands for a wide variety of use cases and languages, without compromising on power or compute requirements. Last, to stop unauthorized use of a voice-activate device, security features such as speaker verification are a must.

For further information about CEVA’s voice control solutions, visit www.ceva-dsp.com

About the Author

Ido Gus serves as CEVA’s Deep Learning Senior Team Leader at the Sensor and Audio Business Unit. He brings over 15 years of experience, spanning software development, algorithm optimization, deep learning algorithm research, and project management, and he specialized in the application of deep learning algorithms to audio and sound processing. Ido holds a B.Sc. in Information Systems Engineering from Ben Gurion University of the Negev, and an MBA from the Hebrew University of Jerusalem. He is passionate about leading cutting edge deep learning projects from research to optimized implementation on edge devices.

Fresh From the Bench

Sumiko Phono Cartridges and RS78 78 rpm Stylus

By Gary Galo

Longstanding audioXpress author and expert Gary Galo reviews the latest Sumiko Olympia and Wellfleet phono cartridges, and RS78 78 rpm stylus. The new RS78 is Sumiko's first 78 rpm solution, meeting the needs of many vintage recording enthusiasts and can be installed on many of the brand's cartridges. In this review, Gary discusses the merits of these versatile and high-quality cartridges while exploring his personal collection of 78 rpm recordings. He details the predictable playback challenges, as only someone with the deep knowledge and experience with collecting 78s and working in archives would be able to offer. This article was originally published in audioXpress, October 2022. Read the Full Article Now Available Here

Voice Coil Patent Review

Low Profile Loudspeaker Device

By James Croft

James Croft explores patents granted to Mattias and Timothy Scheek, on behalf of Mayht Holding B.V. Our readers will recognize the name from the April 2022 announcement of the acquisition by Sonos of Mayht Holding BV, for "$100 million in existing cash on hand.” Having followed their work through a number of patent publications, Mayht's technology triggered Croft's curiosity, for its unconventional structural designs, appearing to represent fresh thinking. Croft offers some context for the company's bold claims and shares his thoughts. As Croft writes, "As I continued to follow the patent trail, combined with reading the promotional disclosures, the claims that the company was making were quite bold. Not only were the claims suggesting incremental advancements in one or two parameters, some claims were orders of magnitude improvements over the current state of the art in nearly every significant category." Read the first of his reviews. This article was originally published in Voice Coil, July 2022. Read the Full Article Now Available Here

audioXpress November 2022: Digital Login

Don't Have a Subscription?

Subscribe TODAY!

Voice Coil November 2022: Digital Login

Qualify for a FREE Subscription!

audioXpress | Voice Coil | LIS

Advancing the Evolution

of Audio Technology

audioXpress features great articles, projects, tips, and techniques for the best in quality audio. It connects manufacturers and distributors with audio engineers and enthusiasts eager for innovative solutions in sound, acoustic, and electronics.

Voice Coil, the periodical for the loudspeaker industry, delivers product reviews, company profiles, industry news, and design tips straight to professional audio engineers and manufacturers who have the authority to make powerful purchasing decisions.

The Loudspeaker Industry Sourcebook is the most comprehensive collection of listings on loudspeaker material in the industry. Purchasers and decision makers refer to the guide for an entire year when making selections on drivers, finished systems, adhesives, domes, crossovers, voice coils, and everything in between.