Microsoft ends support for Internet Explorer on June 16, 2022.
We recommend using one of the browsers listed below.

  • Microsoft Edge(Latest version) 
  • Mozilla Firefox(Latest version) 
  • Google Chrome(Latest version) 
  • Apple Safari(Latest version) 

Please contact your browser provider for download and installation instructions.

Open search panel Close search panel Open menu Close menu

August 17, 2023

Picking out what really matters: NTT's Concept Beam

NTT has recently unveiled an extraordinary invention known as ConceptBeam. This innovative technology stands at the frontier of data analysis, possessing the capability to parse signals according to their inherent semantic content. Its primary application allows for the precise extraction of specific audio signals from an assortment of mixed data, a function unparalleled by any other technology to date.

Traditionally, the techniques employed in source separation lean heavily on the physical properties inherent to the signals, such as the direction from which the sound originates or unique speaker characteristics. Some methods rely on voice pitch while others utilize signal independence. In essence, these techniques primarily dissect the physical components of the signals in question.

As an example, consider SpeakerBeam, another technological breakthrough from NTT. This tool focuses on extracting the signal by honing in on the unique voice characteristics of a predetermined speaker, thereby directing sensitivity towards that specific individual.

ConceptBeam diverges considerably from these traditional methods. Instead of focusing on the physical properties of the signal, it leverages NTT's research into concept acquisition, which entails extracting meaning from data. ConceptBeam, therefore, can be perceived as a 'concept filter,' designed to isolate a target voice based on the semantic content of their speech.

For instance, in a conversation where the topics of discussion are as different as broccoli and motorcycles, if a picture of broccoli is provided as a hint, the system will sift through the noise to extract only the voice discussing broccoli.

Two fundamental technological aspects underpin the functionality of ConceptBeam. The first is the ability to express semantic content on a computer. The second is how it can use that semantic representation to extract the target voice. The representation of concepts is formulated as a vector, essentially a set of numbers strategically positioned in a feature space. The creation of this space can incorporate information known beforehand, whether it be related or unrelated. When inundated with a vast amount of data, the system is able to design a space where related concepts are clustered closely together, irrespective of the nature of the information.

ConceptBeam blends this methodology with the signal filtering approach of SpeakerBeam. It discerns the speech intervals that correspond to a specified concept and then extracts the voice of the speaker during those specific intervals. This is made possible by deriving feature vectors from the signal, which specify the concept, and the mixed speech. The technology then computes the similarity between these vectors. It is even capable of pinpointing individual speakers out of a crowd of speakers, as well as specific topics of discussion.

In an experimental evaluation of this technology, ConceptBeam outperformed traditional methods, showcasing remarkable accuracy in isolating target speech from mixed speech comprising multiple speakers with divergent themes. The accuracy was better than those of traditional techniques, such as conducting speech recognition on mixed speech and the approach of source separation of mixed speech.

Moving forward, NTT envisages using this technology to sift through the plethora of data available, to extract and select pertinent information. The ultimate objective is to bring semantic processing into the realms of signal processing and pattern processing. This is expected to facilitate rapid and precise identification and utilization of sought-after information, proving beneficial to individuals and industries alike.

NTT--Innovating the Future of Communication

Picture: Daniel O'Connor

Daniel O'Connor joined the NTT Group in 1999 when he began work as the Public Relations Manager of NTT Europe. While in London, he liaised with the local press, created the company's intranet site, wrote technical copy for industry magazines and managed exhibition stands from initial design to finished displays.

Later seconded to the headquarters of NTT Communications in Tokyo, he contributed to the company's first-ever winning of global telecoms awards and the digitalisation of internal company information exchange.

Since 2015 Daniel has created content for the Group's Global Leadership Institute, the One NTT Network and is currently working with NTT R&D teams to grow public understanding of the cutting-edge research undertaken by the NTT Group.