Microsoft ends support for Internet Explorer on June 16, 2022.
We recommend using one of the browsers listed below.

  • Microsoft Edge(Latest version) 
  • Mozilla Firefox(Latest version) 
  • Google Chrome(Latest version) 
  • Apple Safari(Latest version) 

Please contact your browser provider for download and installation instructions.

Open search panel Close search panel Open menu Close menu

July 22, 2020


NTT's team wins 1st place in Audio Captioning task at DCASE 2020 Challenge

Yuma Koizumi, with the Media Intelligence Laboratories of the Service Innovation Laboratory Group, and Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, and Kunio Kashino, with the Communication Science Laboratories of the Science and Core Technology Laboratory Group, won 1st place in the Audio Captioning task at the DCASE 2020 Challenge held from March to July this year.

Yuma Koizumi (Researcher)
Yuma Koizumi
Daiki Takeuchi
Daiki Takeuchi
Yasunori Ohishi (Senior Research Scientist)
Yasunori Ohishi
(Senior Research Scientist)
Noboru Harada (Senior Research Scientist, Supervisor)
Noboru Harada
(Senior Research Scientist, Supervisor)
Kunio Kashino (Senior Distinguished Researcher)
Kunio Kashino
(Senior Distinguished Researcher)

The DCASE* Challenge is an annual international competition officially recognized by the IEEE Audio and Acoustic Signal Processing Technical Committee, and this year's event was the sixth. "Automated audio captioning" is a new task DACE introduced this year. The challenge is to automatically generate appropriate and accurate text descriptions or explanations for given audio signals of various non-speech sounds. Ten teams from around the world competed in the task.

NTT is one of the earliest research institutes in the world that to work on the verbalization of sounds. To tackle the task, we took full advantage of the algorithms and knowledge accumulated by the above members, and combined various ideas ranging from pre-processing to post-processing and automated meta-parameter tuning.

Automated audio captioning is an emerging technology field, but a method for achieving it has not yet been established. The capability to describe all kinds of sounds with texts could bring many benefits to our lives in the near future. NTT will therefore continue its research to further strengthen the technology.

DCASE:Detection and Classification of Acoustic Scenes and Events is an international conference on sound event detection and sound scene classification.

DCASE2020 Challenge (DCASE2020) Open other window

Information is current as of the date of issue of the individual topics.
Please be advised that information may be outdated after that point.


WEB media that thinks about the future with NTT

NTT Group

NTT Group Social Media Policy