Microsoft ends support for Internet Explorer on June 16, 2022.
We recommend using one of the browsers listed below.

  • Microsoft Edge(Latest version) 
  • Mozilla Firefox(Latest version) 
  • Google Chrome(Latest version) 
  • Apple Safari(Latest version) 

Please contact your browser provider for download and installation instructions.

Open search panel Close search panel Open menu Close menu

June 17, 2024

NTT Corporation

World's first high-definition visualization of sound waves using high-speed camera and AI
High-sensitivity sound imaging combined with deep learning and optical measurement

News Highlights:

  1. We have developed sound visualization technology using high-speed cameras, laser light, and AI processing
  2. By removing noise using our proprietary deep learning model that considers the physical properties of sound, we can now capture the sound field in high-definition
  3. This result is expected to contribute to the design of sound devices, further understanding of phenomena related to sound, and the realization of the "digital twin of sound" in the future

Tokyo - June 17, 2024 - NTT Corporation (Headquarters: Chiyoda Ward, Tokyo; Representative Member of the Board and President: Akira Shimada; hereinafter "NTT") has achieved the world's first visualization of high-definition sound using a high-speed camera and proprietary deep learning model that considers the physical properties of sound in optical sound measurement technology that senses sound using light (Note [1]). This makes it possible to observe sound waves traveling through the air as moving images (Figure 1). This result can be used as a new sensing method in sound research and development and is expected to contribute to noise evaluation, development of new sound devices, and increase the efficiency of conventional technologies. In the future, it is expected to be used in the "digital twin of sound" technology, which will completely digitize all sounds in the space.
 This result will be exhibited at the NTT Communication Science Laboratories Open House 2024, which will be held from June 24.

Figure 1 Framework for the Results Figure 1 Framework for the Results

1. Background of the research

Sound is an integral part of our daily lives, and we are surrounded by sound including voice communication such as conversation, listening to music and online meeting using speakers and headphones, various types of noise, and ultrasonic sensors.
 By listening to sounds, humans can obtain various information such as the tone, resonance, and direction of the sound. Sound is pressure fluctuations in the air, and sound generated at a certain point travels through the air as waves. This effect is similar to the ripple created by a stone thrown into the water and spreading around at a constant speed. However, unlike the water surface, sound cannot be seen directly by the eyes, and it propagates through space in a complex manner with reflection and diffraction, making it difficult to understand how sound is generated and transmitted. NTT has been researching and developing optical sound field imaging1, a technology that uses light to visualize sound fields2, in the belief that various issues about sound can be solved by enabling people to "see" sound rather than hear it.
 Optical sound field imaging is a technology that uses a special imaging device that converts invisible sound into the brightness of light to record the shape of sound ripples at a given moment as an image — just like taking a photograph. Compared to microphone arrays, which are generally used to measure spatial characteristics of sound, optical sound field imaging has about 100 times higher spatial resolution (Table 1). This is achieved using a special imaging device that converts invisible sound into the brightness of light. This allows you to literally "see" where and how the sound waves transmit. However, optical sound field imaging requires the detection of very small signal changes, so it has been difficult to visualize sound with high sensitivity and high definition because of the relatively large influence of optical noise.

Table 1 Comparison of Microphone Array and Optical Sound Field Imaging Table 1 Comparison of Microphone Array and Optical Sound Field Imaging

2. Results of the research

Using optical sound field imaging and proprietary deep learning model, we succeeded in significantly improving the accuracy of optical sound field imaging, which captures sound as moving images. As a result, we have shown that it is possible to image weak sound waves, which could not be detected by conventional techniques, with high resolution (Figure 2). High-resolution sound imaging is achieved by applying a neural network that extracts only the weak sound wave component in the image with high sensitivity to the noisy image taken by a high-speed camera. Figure 2 shows an image of the sound field taken every 60 microseconds by optical sound field imaging, in which sound waves propagate from left to right. We can see that the AI processing of this result clearly captures how sound waves travel through the air.

Figure 2 Sound field imaging results. Each image represents a sound field at a given moment, and the color corresponds to the loudness of the sound. Camera noise included in an image is removed by AI processing. Figure 2 Sound field imaging results. Each image represents a sound field at a given moment, and the color corresponds to the loudness of the sound. Camera noise included in an image is removed by AI processing.

3. Key points of the technology

(1) Optical sound field imaging technology to capture sound as a moving image

Optical sound field imaging uses light to detect sound in the air (Figure 3). Sound travels through air as dense waves, but due to a phenomenon called acousto-optic effect3, when light passes through the air containing sound, the speed of light changes slightly depending on the density of the gas. Sound is measured by propagating a laser beam into the sound field to be measured, and using optical techniques such as interferometers to detect subtle changes in light caused by sound with high sensitivity. By capturing such light fluctuations at a speed of several thousand to several hundred thousand frames per second using a high-speed camera, the sound waves can be captured as moving images.

Figure 3 Overview of Optical Sound Field Imaging Figure 3 Overview of Optical Sound Field Imaging

(2) Removing noise with proprietary deep learning models

In optical sound field imaging, changes in optical signals caused by sound are very small, so it has been difficult to visualize sound waves in high definition from captured images. Especially in highly sensitive measurements, optical noise contained in the laser beam and the image sensor significantly degraded the quality of sound visualization. In this result, we devised a new proprietary deep learning model that eliminates unnecessary noise and visualizes only sound waves in moving images taken by high-speed cameras, and realized high-definition optical sound field imaging (Figure 4). The proprietary model was trained using training images artificially generated by operations based on the physical properties of sound. Furthermore, our proprietary algorithm that processes moving images independently for each frequency achieves highly accurate noise removal processing that greatly exceeds that of conventional methods (Figure 5).

Figure 4 Deep Learning Model and Processing Technique Figure 4 Deep Learning Model and Processing Technique

Figure 5 Experimental Results Figure 5 Experimental Results

4. Outlook

NTT proposes digital twin computing as part of the IOWN concept and is conducting R&D. The results of this research are expected to be applied not only to visualizing sound, but also to "digital twin of sound" technology that digitizes all sounds existing in space. We will continue our research and contribute to the creation of an optimal sound environment that is comfortable for everyone.

Note [1]K. Ishikawa, D. Takeuchi, N. Harada, and T. Moriya "Deep sound-field denoiser: optically-measured sound-field denoising using deep neural network,'' Opt. Express, vol.31, no.20, pp.33405-20, 2023.

1Sound field
It is a space where sound exists and is transmitted. Here, the value of sound pressure, which is the physical quantity of sound, is expressed numerically at a certain point at a certain time.

2Optical sound field imaging
This technology visualizes sound fields using light. An optical interferometer detects slight variations in light caused by sound caused by acousto-optic effects (*3). In particular, by capturing the sound field as an image using a camera, it is possible to visualize the sound, that is, image the sound field.

3Acousto-optic effect
When light propagates through a space where sound exists, it changes the characteristics of the light. Because sound is a change in the density of the medium, it changes the propagation speed of light through it. Various effects appear depending on the difference in the medium and the loudness and frequency of the sound. For sound in air, a quantity called the phase of light changes very slightly. By detecting this very small fluctuation with high sensitivity, we can measure sound using light.

About NTT

NTT contributes to a sustainable society through the power of innovation. We are a leading global technology company providing services to consumers and businesses as a mobile operator, infrastructure, networks, applications, and consulting provider. Our offerings include digital business consulting, managed application services, workplace and cloud solutions, data center and edge computing, all supported by our deep global industry expertise. We are over $97B in revenue and 330,000 employees, with $3.6B in annual R&D investments. Our operations span across 80+ countries and regions, allowing us to serve clients in over 190 of them. We serve over 75% of Fortune Global 100 companies, thousands of other enterprise and government clients and millions of consumers.

Media contact

NTT Science and Core Technology Laboratory Group
Public Relations
nttrd-pr@ml.ntt.com

Information is current as of the date of issue of the individual press release.
Please be advised that information may be outdated after that point.