The Audio Voice 159: Alexa, tu comprends? The Language Frontier for Voice Applications. OPPO Digital UDP-205 Review!

INDUSTRY & PRODUCT NEWS

Listen Technologies Acquires Audio Everywhere Brand and Assistive Listening Products

Listen Technologies, a leading provider of assistive listening products for more than 18 years, has acquired the Audio Everywhere brand and products from ExXothermic, a developer of Wi-Fi streaming technology with four related patents. The technology, which Listen Technologies was already licensing, offers a low-cost assistive listening solution for smartphones that utilizes existing wireless networks for plug-and-play audio streaming. Read More

XMOS Announces First True-Stereo AEC Linear Voice Processor and Far-Field Voice Kit for Smart Devices

XMOS announced the XVF3500 voice processor, which delivers two-channel full duplex acoustic echo cancellation (AEC), along with the world's first stereo AEC far-field linear microphone array solution, the VocalFusion Stereo Evaluation Kit (XK-VF3500-L33). The XVF3500 voice processor is designed for developers working in the growing voice-enabled smart TV, soundbar, set-top box, and digital media adapter markets, all of which require stereo-AEC support for "across the room" voice-interface solutions. Read More

Apogee and Sennheiser Pitch AMBEO Smart Headset to a Wider Audience

Apogee and Sennheiser announced that its co-developed Sennheiser AMBEO Smart Headset is now available at Apple.com and Apple Stores in the US, just in time for the holidays. And it's now available in black, for those channels. An interesting note about the announcement is that, since the binaural headset was launched in May 2017, Sennheiser is now promoting it in a slightly different way, highlighting its use more as a hearing enhancer and high-quality listening device. Read More

StormAudio Separates from Auro Technologies and Forms Immersive Audio Technologies Group

Auro Technologies has agreed to sell its StormAudio High-End Immersive Audio Hardware Brand Division to Yves Trélohan, the company's Senior VP, with backing from Saffelberg Investments. The acquisition encompasses the StormAudio hardware brand, its intellectual properties, infrastructure and inventory, all now hosted in a dedicated new company, Immersive Audio Technologies Group. With the management buyout, StormAudio is free to work closer on technical and marketing initiatives with Dolby and DTS. Read More

Dirac Research to Introduce the Next Generation of Dirac Live Platform at CES 2018

Dirac Research announced several major updates to its pioneering Dirac Live room correction platform, including a refreshed, modern user interface with a simplified setup procedure, improved performance through a new algorithm, and new multi-language support. CES 2018 attendees will be able to experience a pre-release demo of this platform. The updated software is expected to launch in the first quarter of 2018. Read More

Attero Tech Now Shipping AES67 Networked Audio Products

Attero Tech is now shipping its recently announced line of AES67 networked audio endpoint products. Built from the ground up, these AES67 products enable Attero Tech's innovative audio connectivity technologies to interface with the AES67-enabled Q-SYS Platform from QSC. The new AES67 enabled products are also designed for interoperability with all Dante AES67-enabled technologies, providing maximum flexibility for systems leveraging AES67 as a bridge between modern audio networking platforms. Read More

RØDE Microphones Unveils AI-1 USB Type-C Audio Recording Interface

RØDE Microphones has released its first audio interface - the AI-1. For a leading microphone manufacturer, this might seem like an odd initiative, but the Australian company turned the announcement into a powerful audio recording proposition. The RØDE AI-1 is a simple computer 24-bit/96 kHz USB bus-powered desktop interface, featuring a high-quality preamp with phantom power, one Neutrik XLR-1/4" analog combo input, two balanced outputs, and a headphone amplifier, enabling direct monitoring. Read More

Lattice Simplifies Audio Connectivity and Improves Performance with HDMI 2.1 Enhanced Audio Return Channel (eARC) Solutions

Shortly after the publication of the new HDMI 2.1 specification by the HDMI Forum, Lattice Semiconductor announced the release of the SiI9437 and the SiI9438, the first HDMI 2.1 Enhanced Audio Return Channel (eARC) audio receiver and transmitter devices. This enables manufacturers to start implementing solutions that benefit from the self-discovering, self-configuring, and fully automatic HDMI Audio Link for flawless home theater experiences. Read More

João

Martins

Editor-in-Chief

Editor's Desk

What's Missing on Smart Voice

With all the industry's excitement about voice interfaces, voice personal assistants, Artificial Intelligence (AI), and Machine Learning, I thought it was about time to address the elephant(s) in the room. All 7,099 of them!

Alexa, Pourquoi tu me comprend pas?

It's like no one wants to discuss language. There are more than 7.5 billion people in the world, speaking more than 7,000 languages . Of those, 389 languages are used by 94% of the world's population. Less than 400 million people are native English speakers, less than half of those that speak Chinese (Mandarin), much less than those speak Spanish, and almost the same as many as those that speak Hindi, Arabic, or Portuguese. And let's not discuss dialects and regional varieties...

So, when we feel excited about smart speakers, discuss all the market potential for Voice Personal Assistants, project the potential of voice recognition technologies, and forecast the use of voice interfaces dominating the market in all sorts of appliances and being the center of the smart home, we need to address language first.

Travis, a Dutch startup, reached more than $1.5 million in its crowdfunding campaign in 2017. The result, Travis the Translator, promoted as "the first personal voice translator to help bridge language barriers worldwide," is an independent device, not a mobile app, that uses translation software from multiple engines, including from Google and Microsoft.

Of course, we are just starting with voice technologies. Devices are getting better at far-field voice capture, digital signal processing is getting smarter at processing voice, even critical elements, such as microphones, are improving and getting amazingly efficient. But capturing voice is just the first step. Machines understanding human languages is an intricate effort. As Apple, Google, and Microsoft learned earlier on, it is not enough to develop the best voice recognition technology and implement the best and most powerful AI and natural language platforms on the cloud, if users don't actually use it -

even when implemented in smartphones and mobile platforms, which were supposed to be primary devices for voice communications.

There's a reason why Amazon turned to smart speakers, and why it's selling the Amazon Echo and promoting all its Alexa ecosystems and range of products almost exclusively in English language countries. Because for voice platforms to actually gain traction in the market, they must do what people expect it to do, and for that they need to understand the language.

Amazon has managed to lead the way for voice assistant platforms to become ubiquitous in the home, while other hands-free uses in the car and outside the home certainly have a huge potential. But again, English speaking markets are leading the way, while similar technologies for the largest language market in the world - China - are just starting, and are still not as efficient. Yes, voice activation exists in other languages, but there's a huge difference between shouting voice commands directed at a machine and natural conversation in multiple languages, which is critical to the success of voice interfaces.

With Artificial Intelligence, there is the potential that voice engines will one day be able to engage users in real conversations, even asking questions anticipating their needs, or simply become popular for its social interaction as real assistants - the matter of all popular science fiction novels, where "computers" ask us where we want to go, or what we want to eat, before we even think about it. But for that to happen, research in terms of language recognition needs to catch up. As we have seen so far, English language recognition is way ahead in all application areas because the research work started earlier, because the tech companies that conceived the products and applications were based in English-speaking countries, and because English is widely adopted as a universal "translation" language, among other reasons.

Also, because many fields in language recognition research were fragmented - language is naturally considered a matter of national sovereignty - and focused on different specific applications, from voice to text, image and character recognition conversion, to voice commands recognition... and translation. No doubt, language translation is probably one of the most important application fields for voice technologies, and it's not surprising to see so many new consumer electronic companies trying to leverage existing platforms to pitch "miraculous" translation devices that will magically enable us to understand foreign languages. And yet, we are so far from that.

Voice/language recognition is being successfully applied in many areas, especially in voice to text transcription, like generating automatic subtitles in broadcasts and streaming services, but its efficiency has been restricted to a threshold that is far from perfect. For mainstream and critical applications, it is still less-efficient than human processes, while remaining limited in terms of the language implementations.

To understand the problem, we just need to look at text-to-text translation limitations - which depend effectively on the same language engines. Using Google Translate or any similar services reveals that while English to French or vice-versa might work already reasonably well, translating Chinese to Portuguese will not show the same level of accuracy.

Bragi promotes its iTranslate integration with The Dash Pro truly wireless earbuds, allowing translation in real time. Language support is provided by the Austrian-based company.

The great news about current voice personal assistant platforms being pushed by technology companies - instead of countries - is that they envisage the full potential for those applications and are not bound by nationalistic perspectives. On the contrary, they depend upon being language-agnostic to succeed. Applying language learning mechanisms in a global, wider-scale, with cloud-based processing, also allows a level of user contribution that was never previously available. Associated with new Artificial Intelligence technologies, this allows the use of all that input to also introduce more languages and regional variations.

Still, until now, it seems that voice assistants were not "evolving" in that direction. In September of this year, in my Audio Voice editorial where I addressed voice technologies, following the IBC 2017 show, I mentioned a company that was showing promising progress on this front, enabling machine learning to build and expand language support.

Just a few days ago, that same company, Speechmatics, announced that it had "built" 46 additional languages in just six weeks, enabling its AI-powered voice transcription service and real-time virtual appliance to support a surprising total of 72 unique languages.

In that same editorial, I mentioned how I witnessed Speechmatics' real-time voice-to-text transcription engine working in a noisy environment and how the same engine was being used to generate real-time closed-captions from live TV broadcasts in Dutch. That was the reason why I then declared Speechmatics' real-time transcription engine a breakthrough.

Now, it seems the company has achieved another remarkable breakthrough, with the completion of its "Project Omniglot." As Speechmatics states, its unique AI framework, the Automatic Linguist (AL), "paves the way for the wider goal of building every language in the world. Traditionally, building a new language pack takes months and is a laborious affair, meaning only the most widely spoken languages in the world remain the focus. The challenge was to see how many languages AL could build in just six weeks. It exceeded the team's expectations by learning a language a day. Due to the speed at which the languages were produced automatically, Speechmatics are offering for people to use them for free initially."

Speechmatics developed an Automatic Linguist (AL) AI framework that learns a new language with minimal data in a short period of time and intends to build as many - if not all - languages in the world.

As Benedikt von Thüngen, CEO of Speechmatics, explains, "This initial phase of Project Omniglot has proven that the machine learning framework behind AL works extremely well. It can automatically learn the sounds (phonemes) of a language as well as the grammar and semantics in order to determine which sentences make sense. Speech-to-text technology is one of the most widely discussed topics right now and, as the world is becoming increasingly more connected, broad language coverage is also becoming essential. From broadcast subtitling and interview transcription to accessibility within the education sector, Speechmatics is hoping to open the door to a speech-enabled future in as many languages as possible, for more countries than ever before. We are already seeing a shift to a speech-enabled future where voice is the primary form of communication."

Will this new language learning framework be the start of something exciting, providing recognition improvements at an unprecedented pace? If it does, this is going to be critical for audio companies to imagine new concepts that will truly bring the promise of smart speakers and the integration of the smart home to reality. Only with multiple language support can we achieve the massive scales that are needed to realize the industry's vision for voice.

Fresh from the Bench

OPPO Digital UDP-205 Review: A 4K Ultra HD Audiophile Blu-ray Disc Player

By Gary Galo

audioXpress November 2017 featured one of the most eagerly awaited reviews: the OPPO Digital UDP-205 4K Ultra HD Audiophile Blu-ray Disc Player. Already the author of previous reviews of OPPO Digital's prior generations of Blu-Ray players for audioXpress, in his review Gary Galo explains why this is the 4K Ultra HD Blu-ray Disc player for which audio enthusiasts were waiting. The new UDP-205 features dual ES9038PRO Sabre DAC chips from ESS Technology, an improved HDMI audio clock, an asynchronous USB DAC with coaxial and optical inputs, a headphone amplifier, and a dedicated stereo output with XLR balanced connectors. The UDP-205 "universal" player has much in common with its predecessor, the BDP-105, including support for nearly every standard optical disc format. The UDP-205 has added playback of 4K UHD Blu-ray discs to an already thorough array, including regular Blu-ray, Blu-ray 3D, DVD-Video, DVD-Audio, SACD, and CD. Media file support is also exhaustive, and includes AIFF, WAV, ALAC, APE, and FLAC, along with Direct Stream Digital (DSD) audio files in stereo or multichannel. The UDP-205 ($1,299 US) was designed to be a complete media server and has connectivity similar to the BDP-105. The BDP-105 had two HDMI outputs that could be configured using the setup menu for split A/V operation, where HDMI 1 was the A/V output feeding the television, and HDMI 2 was a dedicated high resolution digital audio output. If the user required two displays, the two outputs could be configured for dual-display operation. In the UDP-205, the two HDMI outputs are permanently configured as main and audio only. An Ethernet Gigabit LAN connector allows a wired network connection, and home network wireless access includes built-in 802.11ac Wi-Fi. Previous OPPO Digital players came with a USB wireless "dongle" that functioned as a transceiver for wireless network connectivity. With all wireless network hardware now built into the new player, you'll always be connected to your home network when the player is on. Read the Full Article Now Available Here

Voice Coil Test Bench

D4400Ph Compression Driver from PRV Audio

By Vance Dickason

In this Test Bench, Voice Coil characterizes the second of two new compression drivers from Brazilian pro sound OEM PRV Audio. Following the D4400Ti-Nd, in this characterization Vance Dickason looks at the D4000Ph compression driver, a ferrite version. The D4400TiPh is part of a series of 2" throat 4" diameter voice coil compression drivers, which also includes the D4400Ti-ND and the D4400Ti. With an 8.7" diameter and weighing 21.5 lb., the D4000Ph might be the largest and heaviest compression driver you may have encountered. Also in this case, the D4000Ph is complemented with PRV Audio's WGP22-50 2" bolt on 60° × 60° waveguide. Features for the D4400Ph include a one-piece phenolic resin diaphragm and surround, with a throat diameter of 50 mm (2"). The phenolic resin diaphragm is coupled to a 101.6 mm (4") diameter voice coil wound with CCAW wire on a Kapton former. Other features include a cast-aluminum body with the PRV Audio logo, a field replaceable diaphragm; 200 W RMS rated power handling (400 W with program material), a minimum crossover of 700 Hz (using a second-order network), a ferrite ring magnet motor, and color-coded push terminals. This article was originally published in Voice Coil, May 2017. Read the Full Article Online

AX December 2017: Digital Login

Don't Have a Subscription?

SAVE 25% TODAY!

VC December 2017: Digital Login

Qualify for a FREE Subscription!

audioXpress | Voice Coil | LIS

Advancing the Evolution of Audio Technology

audioXpress features great articles, projects, tips, and techniques for the best in quality audio. It connects manufacturers and distributors with audio engineers and enthusiasts eager for innovative solutions in sound, acoustic, and electronics.

Voice Coil, the periodical for the loudspeaker industry, delivers product reviews, company profiles, industry news, and design tips straight to professional audio engineers and manufacturers who have the authority to make powerful purchasing decisions.

The Loudspeaker Industry Sourcebook is the most comprehensive collection of listings on loudspeaker material in the industry. Purchasers and decision makers refer to the guide for an entire year when making selections on drivers, finished systems, adhesives, domes, crossovers, voice coils, and everything in between.