The Audio Voice 372: Good News. Automatic Audio-to-Text Transcription Engines Are Evolving Every Year!

Weekly Newsletter for the Audio Industry and Audio Product Developers

Product Design | Audio Electronics | Acoustics | DIY | Audio Innovations

Industry & Product News

Hypex Announces New and Improved NCOREx Class-D Amplifier Technology

Hypex Electronics announced a new and improved iteration of its popular NCOREx Class-D amplifier technology and unveiled the NCx500 OEM, the first new NCOREx module to be launched. According to the Dutch company, the new technology will afford at least 2x better performance compared to original NCORE products — the reason for the x designation, as in eXceptional! Hypex will introduce the new NCOREx technology at the Integrated Systems Europe (ISE) and High End shows in May 2022, paving the way for future OEM products. Read More

Watch EveryWord™ Far-field Voice Capture Outperform the Leading OEM Solution

Scott McNeese, Director of Voice Technology at ArkX Labs, demonstrates the superior performance of EveryWord™ Ultra Far-field Voice Capture Technology featuring Cirrus Logic and Sensory speech recognition technologies.

Watch as our EveryWord™ solution, using 3D reverberation, captures voice commands from 3x the distance (>9 meters) of standard far-field technology, around corners, and best of all, in noisy and reverberative environments without having to lower playback volume from competing sources. Read More

DSP Concepts and StreamUnlimited Announce Reference Design for Advanced Voice Control Development

DSP Concepts and StreamUnlimited are working together to support audio product development with a new reference design that offers advanced voice control features. DSP Concepts offers its Audio Weaver and TalkTo software development tools for StreamUnlimited's established range of hardware modules, including the latest Stream1955 next generation System-on-Module (SoM) based on the latest NXP Quad Core SoC. Together with the StreamSDK development environment, the platform offers support for all major voice assistants. Read More

AKM Developed New Flagship Velvet Sound AK4499EXEQ Stereo D/A Converter

The new flagship DAC Velvet Sound | Veritas AK4499EX and the previously announced premium DAC AK4493S are two of the latest products developed by Asahi Kasei Microdevices Corp. (AKM) to be unveiled for the first time at the High End Show 2022. AKM will exhibit at the International Parts+Supply (IPS) section of the show in Munich, Germany, showcasing all the new-generation products, as the Japanese company gradually resumes production and shipments. Read More

From Microspeakers to Room Acoustics: COMSOL Promotes Three Acoustics Simulation Online Events

COMSOL continues its global virtual event series with COMSOL Day: Acoustics, to be held in three different time zones, with each event featuring a one-day program focusing on acoustics simulation using COMSOL Multiphysics software. Sessions will be hosted online on April 28, May 5, and May 19 by organizers in the US, Europe, and China, respectively. Three online COMSOL Day events will showcase the benefits of multiphysics simulation to industries developing audio and acoustics products. Read More

Halter Technical Unveils Microsone Discreet Audio Monitoring System at NAB Show 2022

Halter Technical, a Los Angeles, CA-based creator of headphones and headsets purposely built for the broadcast and video production industry, announced a new wireless design that takes lessons from the most recent Bluetooth TWS earbuds and hearing aids to create what the company calls the world’s smallest rechargeable miniature wireless in-ear monitor (IEM). The Microsone Discreet Audio Monitoring System introduces a new level of flexibility and control on set. Read More

Optical MEMS Microphone Technology from sensiBel Receives Funding Needed to Enter Production

sensiBel is a Norwegian company developing a new generation of MEMS microphones using patented optical technology that it believes will be able to reach 80dB SNR and rival conventional studio microphone capsules, with all the benefits of a fully encapsulated solution. The company has now announced a successful funding round of 15 million Euros, led by Germany’s TRUMPF Venture and including The European Council’s EIC Fund, Skagerak Capital, Investinor, and SINTEF Venture IV. sensiBel will use the funding to ramp up production. Read More

Bang & Olufsen Introduces Beoplay EX True Wireless ANC Earphones

Bang & Olufsen announced the launch of Beoplay EX, the latest addition to its true wireless personal audio portfolio. Selling points include superior sound reproduction, and adaptive ANC in a fully waterproof design, making the Beoplay EX "the most versatile true wireless earphone model from Bang & Olufsen to date." The Danish company also says that the new model was intended to simplify personal sound, by offering an ideal combination of performance and style that would make the Beoplay EX earbuds appealing in the current competitive landscape. Read More

Vincent Audio Updates PowerLine Series with SV-228 Hybrid Integrated Amplifier

Vincent Audio is a designer and manufacturer of high-fidelity stereo and multichannel components, headquartered in Germany with manufacturing in Asia, Germany, or a combination of the two. This strategy allows the company to offer unique features and designs at affordable prices. Expanding Vincent Audio’s popular PowerLine series, the SV-228 is a new hybrid integrated amplifier that includes VU meters embedded in the front panel, a cool loudness switch, and offers an integrated DAC and Bluetooth 5 input. Read More

Editor's Desk

J. Martins

(Editor-in-Chief)

Speech-to-Text Advances

State of Automatic Speech Recognition Annual Report

As I wrote a few weeks ago, it is evidently clear that trade shows will resume in full force this year. At this stage, even if a new virus variant pops up (and it will), it's already too late to stop most of us who are eager to resume our professional activity again eye-to-eye, person-to-person, to board a plane and start "walking the halls.”

The next dates in the trade show calendar start this week, with Audio Expo North America (AXPONA), happening April 22-24 at the Renaissance Schaumburg Hotel & Convention Center, Schaumburg, IL, with the NAB Show 2022 at the Las Vegas Convention Center (LVCC), Las Vegas, NV, shortly after (April 24-27), and the Prolight+Sound 2022 show in Frankfurt, Germany (April 26-29). And not surprisingly, the number of press announcements are starting to anticipate those shows. As always, the broadcast industry gains by a large margin in quantity and quality of announcements.

As I have written many times, for some reason, the hi-fi and high-end audio companies continue to ignore those essential press releases and think they don't need to hire PR professionals to prepare for the shows. It's shocking. I get piles of well-written 800-word pieces profusely illustrated about 19" equipment racks that sell for $200 USD, and no information whatsoever about a $20,000 speaker design that is supposed to be the best thing since the invention of the wheel - about which Oliver Masciarotte will be able to write a mere paragraph of subjective impressions from AXPONA.

But enough about that. It’s their loss. There's plenty of exciting news stories to keep us busy in this very newsletter - and I just wish we had a full team of writers to publish everything of interest to our audience that comes to our knowledge.

3Play Media conducts annual research to determine how the top automatic speech recognition (ASR) engines perform in regard to captioning and transcription. The results of this research are then published in the “State of Automatic Speech Recognition” report, now available.

Sometimes, I get pieces that are a bit more than a press release, and I finally realize that the content is not appropriate for our regular Industry or Product News sections, and is more suitable for a full article that will require substantial work. But the essence of the story is effectively full of valuable knowledge that deserves immediate attention.

This week, in the ramp up for NAB 2022, I received not one but multiple items detailing a recent study published by 3Play Media, a company from Boston, MA, that provides patented solutions for closed captioning, transcription, live captioning, audio description, and subtitling. I had a close look at the research, because it provided valuable insights about the use of machine learning (ML) and automatic speech recognition (ASR) technologies, which are currently crossing over to mass market consumer applications, such as transcription of Zoom calls. To be clear, 3Play Media provides an online platform for these services to large organizations as a professional service, where the use of these cutting-edge technologies is normally validated with human review. But they also provide real-time captioning, when human review is not possible.

3Play Media is therefore in a position to evaluate the state of the art in automatic audio-to-text transcription engines, including multi-language support, for the most demanding applications (media and entertainment, corporate, higher education, government, etc.) and it publishes an annual report about the progress in this field. I thought the results of that study – “3Play Media's State of Automatic Speech Recognition (ASR) Report” - are worth sharing.

More about the 2022 State of Automatic Speech Recognition annual research.

The good news that immediately caught my attention is that 3Play Media found automatic speech recognition technology has advanced over the last 12 months - even if human intervention is still necessary for accuracy in captioning use cases for their level of service. Overall, all available engines improved over the year. This means that, as I had predicted, progress in perfecting these (already impressive) speech-to-text engines is real due to the intense research that is taking place, and the availability of better trained ML models in more languages.

The annual study looks at the general state of speech-to-text technology and evaluates how nine major speech recognition engines perform at the task of captioning and transcription. According to the study, the accuracy of the technology has measurably improved since the company’s last report, published in January of 2021. They tested all nine engines using a large dataset representative of 3Play Media’s diverse customer base and evaluated accuracy against two measurements: Word Error Rate (WER) and Formatted Error Rate (FER), which includes formatting errors such as grammar, speaker identification, and non-speech elements in addition to word errors.

Word Error Rate (WER - widely used to determine speech recognition quality) results across tested APIs. The following APIs/engines were used for batch testing: 3Play (Speechmatics latest, enhanced model, with the addition of 3Play post-processing and mappings); SMX (Speechmatics’ latest, enhanced model, without 3Play post-processing); Google standard model; Google VM (optimized for video); Rev.ai (Rev’s V2 engine); Microsoft; IBM; Voicegain; and Deepgram.

In both WER and FER measurements, Speechmatics with 3Play modeling and post-processing led the results, followed by the Speechmatics engine used on its own, and Microsoft. Rev, Google VM, and Voicegain engines followed, each with respectable scores, which were close enough that these vendors are hard to differentiate. "Despite exciting improvement across the board, all engines performed well below the industry standard of 99% accuracy, confirming that ASR on its own still falls short of being “good enough” for compliance with closed captioning legal requirements," 3Play Media states.

“As the AI models driving ASR continue to evolve, many of the engines we evaluated have shown significant strides in their transcription accuracy over the last two years,” Chris Antunes, Co-CEO and Co-Founder, 3Play Media, said. “We run this report every year because we use ASR in our own transcription process, and we have a vested interest in making sure we’re utilizing the best engine on the market. Speechmatics remains a clear industry leader in both pre-recorded and live automated transcription, and applying 3Play’s mappings and post-processing resulted in an exciting improvement in word error rate of over 8%.”

Comparison of the overall WER from 2020 testing with 2021 tests. Last year’s results showed that several engines, particularly Microsoft, were beginning to catch up to Speechmatics. Speechmatics has since released an enhanced model which shows a 19% better WER, relatively, than its closest competitor, Microsoft, and thus, Speechmatics maintains its edge across all markets. In general, performance has improved by several absolute percentage points across the board since the last report, published in January 2021.

The study showed a wide range in accuracy among the technology tested, with the highest and lowest performing engines differing by more than 15 percentage points. This suggests that different engines are optimizing for different goals, and some ASR engines will not perform well for transcription. Compared to other uses of speech-to-text technology like automated assistants that are able to train on a specific voice, transcription is a very difficult task, with variables such as diverse sentence structure and spontaneous speech, specialized terminology, and complex patterns including multiple speakers, accents, and background noise.

I wrote this piece, largely using the information provided by 3Play Media in its press release. I cannot validate the research but I know that our audience working in the field will look at it carefully as it is in their interest, and will find these results extremely interesting.

3Play Media offers access to the complete results of its research to those interested. I'm providing some results in the tables provided courtesy of 3Play Media. A complete and detailed article for audioXpress is in preparation

3Play Media’s live automatic captioning service uses Speechmatics’ API to provide real-time captioning with no additional human editing with live automatic captioning.

Fresh From the Bench

Olive Pro Hearing Aids and Bluetooth Earbuds

By Brent Butterworth

The Olive Pro Hearing and Bluetooth Earbuds were brought to market by Olive Union during 2021. These are 2-in-1 true wireless earbuds and smart (FDA-Registered class II) hearing aids, featuring a modern design that was intentionally conceived to look like futuristic Bluetooth earbuds to combat the social and economic barriers of hearing loss. When the company temporarily paused its sales efforts in the US, Brent talked with the company to find out what happened and their plans to adjust and return to market. The company has recently announced a completely different product with even more ambitious goals. Time to read the complete review, essential for those interested in over-the-counter (OTC) hearing-aids and this promising application field. This article was originally published in audioXpress, January 2022. Read the Full Article Now Available Here

Voice Coil Test Bench

SB Acoustics’ Satori MW13TX-4 5” TPCD Cone Midbass Driver

By Vance Dickason

In this Test Bench, Vance Dickason characterizes the latest 5" TPCD cone midbass driver from SB Acoustics, the Satori MW13TX-4. TPCD is the acronym for Thin-Ply Carbon Diaphragm, which is essentially the technology developed in Sweden by Oxeon to create a new-generation material with properties that are now well-proven for speaker cones and domes. This technology is marketed under the brand TeXtreme, which is now recognized in the loudspeaker industry, and its success led Oxeon to spin off its business and create a dedicated company, which is now Composite Sound. The Satori MW13TX-4 TPCD cone 5" woofer is the fourth TPCD diaphragm transducer to be explicated in Test Bench, and this one has a substantial feature set that begins with a proprietary six-spoke cast aluminum frame, comprised of narrow spokes, completely open below the spider mounting shelf for cooling an inverted, single piece bowl-shaped TPCD cone and NBR surround. The FEA-optimized neodymium magnet is coupled with a 1.2" voice coil wound with round copper clad aluminum wire (CCAW) on a non-conducting fiberglass former. This article was originally published in Voice Coil, January 2022. Read the Full Article Now Available Here

audioXpress May 2022: Digital Login

Don't Have a Subscription?

Subscribe TODAY!

Voice Coil April 2022: Digital Login

Qualify for a FREE Subscription!

audioXpress | Voice Coil | LIS

Advancing the Evolution

of Audio Technology

audioXpress features great articles, projects, tips, and techniques for the best in quality audio. It connects manufacturers and distributors with audio engineers and enthusiasts eager for innovative solutions in sound, acoustic, and electronics.

Voice Coil, the periodical for the loudspeaker industry, delivers product reviews, company profiles, industry news, and design tips straight to professional audio engineers and manufacturers who have the authority to make powerful purchasing decisions.

The Loudspeaker Industry Sourcebook is the most comprehensive collection of listings on loudspeaker material in the industry. Purchasers and decision makers refer to the guide for an entire year when making selections on drivers, finished systems, adhesives, domes, crossovers, voice coils, and everything in between.