Nine papers authored by NTT Laboratories have been accepted at ICML (International Conference on Machine Learning) 2025 held in Vancouver, Canada, from 13th to 19th July 2025. ICML is one of the most prestigious international conferences in the field of machine learning, a core technology that underpins modern artificial intelligence. It serves as a leading venue for presenting cutting-edge research on the theoretical foundations, algorithms, and applications of machine learning. ICML is known as one of the most competitive conferences in the research field, and the acceptance rate this year is 26.9% (3,260 papers accepted among 12,107 submissions).
Abbreviated names of laboratories are listed as follows:
HI: NTT Human Informatics Laboratories
SI: NTT Social Informatics Laboratories
CD: NTT Computer and Data Science Laboratories
CS: NTT Communication Science Laboratories
NS: NTT Network Service Systems Laboratories
■Plausible Token Amplification for Improving Accuracy Differentially Private In-Context Learning Based on Implicit Bayesian Inference
- Yusuke Yamasaki(SI), Kenta Niwa(CS), Daiki Chijiwa(CD) , Takumi Fukami(SI),Takayuki Miura(SI)
- In tasks such as answering frequently asked questions, which involve structured input–output pairs, in-context learning (ICL) has gained attention for its ability to improve output accuracy by showing a few input-output pairs to a large language model (LLM) as context before the new input. While useful, using raw input-output pairs risks leaking sensitive information. To address this, researchers rely on Differential Privacy (DP) --- a standard that limits information leakage --- by adding noise to create synthetic input-output pairs for ICL, thereby protecting the privacy of the raw input-output pairs. This method, known as DP-ICL, mitigates leakage of individual pair but often degrades accuracy. Although a practical remedy exists, its effectiveness has been theoretically unclear. We offer a theoretical explanation showing that the existing remedy is reasonable, but also reveals room for improvement. On the basis of this insight, we propose Plausible Token Amplification (PTA), the first theoretically grounded approach that improves DP-ICL by generating clearer, more informative synthetic input-output pairs while maintaining the same level of privacy guarantee as existing methods. This paves the way for broader adaptation of LLMs in fields like a healthcare, government service, and finance, where both data privacy and high-quality outputs are essential.
■Linear Mode Connectivity between Multiple Models modulo Permutation Symmetries
- Akira Ito(SI), Masanori Yamada(SI),Atsutoshi Kumagai(CD/SI)
- Model merging, a technique that combines multiple models into a single one, is gaining attention as a promising approach in machine learning. Previous studies have proposed methods for merging two pretrained models by leveraging the permutation symmetry inherent in neural networks to construct functionally equivalent but distinct models. In this study, we extend these methods to support the simultaneous merging of three or more models. Through experiments on image classification tasks, we demonstrate that the performance of the merged model improves as the number of models being merged increases. These findings enhance the practicality of model merging, enabling the integration of multiple AI models without sharing training data. This contributes to the development of privacy-conscious AI applications.
■Positive-unlabeled AUC Maximization under Covariate Shift
- Atsutoshi Kumagai (CD/SI), Tomoharu Iwata (CS), Hiroshi Takahashi (CS/CD), Taishi Nishiyama (SI), Kazuki Adachi (CD), Yasuhiro Fujiwara (CS)
- In applications such as cyberattack detection, anomaly detection, and medical diagnosis, important data such as attacks, anomalies, or diseases (positive data) are typically much rarer than other data (negative data). AUC maximization is a representative approach for learning accurate classifiers from such imbalanced data. However, it does not handle covariate shift, where the input distribution differs between training and deployment phases. In this paper, we theoretically show that under covariate shift, it is possible to perform AUC maximization using only unlabeled data from the deployment phase and positive and unlabeled data from the training phase. We also confirm its effectiveness through experiments.
■Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
- Daiki Chijiwa (CD), Taku Hasegawa (HI),Kyosuke Nishida(HI) ,Kuniko Saito(HI),Susumu Takeuchi(CD)
- In recent years, to develop AIs with domain-specific knowledge, it has been common to perform additional training of pre-trained models, or foundation models. However, every time the foundation model is updated or changed, conventional methods require additional training of the given foundation model on each domain, which increases maintenance costs for such specialized AIs. In this paper, we reformulated additional training as a kind of inverse reinforcement learning, in which we train reward models that reflect domain-specific knowledge. Once reward models are trained, they can be reused across various foundation models and thus drastically reduce the maintenance costs.
■Natural Perturbations for Black-box Training of Neural Networks by Zeroth-Order Optimizations
- Hiroshi Sawada (CS), Kazuo Aoyama (CS), Yuya Hikima (CS)
- Neural networks are generally implemented on ordinary computers with CPUs or GPUs and memory, and are trained using the well-known backpropagation algorithm, which requires detailed internal information stored in memory. In contrast, there has been growing interest in developing methods that train neural networks without detailed internal information. We employ a black-box optimization method, where we perturb neural network parameters slightly and observe how the training loss function changes. While most existing methods perturb each parameter independently, our method considers parameter correlations and perturbs them so that the neural network's output does not change drastically. We call such generated perturbations natural perturbations. The term natural has the same meaning as natural gradient used when detailed internal information is available. The experimental results show that our method clearly outperforms existing methods. Our contribution accelerates research on emerging methods that train neural networks directly implemented on hardware or in memory-constrained environments.
■Learning to Generate Projections for Reducing Dimensionality of Heterogeneous Linear Programming Problems
- Tomoharu Iwata (CS), Shinsaku Sakaue (CyberAgent, Inc.)
- We propose an efficient method for solving a class of optimization problems, specifically linear programming problems. Our approach uses a neural network to learn how to simplify these problems. Once trained, the model can quickly give good solutions to new problems without solving them from scratch.
■Guided Zeroth-Order Methods for Stochastic Non-convex Problems with Decision-Dependent Distributions
- Yuya Hikima (CS), Hiroshi Sawada (CS), Akinori Fujino (CS)
- In this study, we address optimization problems in which the underlying probability distribution depends on the decision variable. Such problems arise in various real-world applications. For instance, in finance, a lender may want to train a classifier with the parameter to identify reliable customers. However, the distribution of customer features can change depending on the parameter, since customers might adjust their features in response to the classifier's behavior. This interaction creates the need to solve an optimization problem where the distribution is influenced by the decision variable—known as a decision-dependent distribution. To tackle this challenge, we propose new zeroth-order optimization methods that effectively construct update directions by leveraging information from the known objective function.
■K2IE: Kernel Method-based Kernel Intensity Estimators for Inhomogeneous Poisson Processes
- Hideaki Kin (CS/HI), Tomoharu Iwata (CS), Akinori Fujino (CS)
- Poisson processes are widely used to analyze and forecast event patterns occurring in space and time, from tweets in SNS to disease outbreaks. A key challenge in using them is estimating the intensity function, which tells us how likely events are to occur at different locations. While recent approaches based on kernel methods provide accurate estimates, they are often very slow for large datasets. In this paper, we introduce a new kernel method-based approach that replaces the commonly used likelihood function with the least squares loss, offering a major boost in computational efficiency. We show that the proposed method achieves comparable accuracy while being significantly faster than previous kernel-based methods. Moreover, we show that it connects closely to the kernel intensity estimator, a classical method known for its simplicity. These results make our approach both scalable and theoretically sound, helping researchers apply Poisson processes to large-scale scientific data.
■Deep Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines
- Sho Sonoda (RIKEN/CyberAgent, Inc.), Yuka Hashimoto (NS/CS), Isao Ishikawa (Kyoto University/RIKEN), Masahiro Ikeda (The University of Osaka/RIKEN)
- In this work, we comprehensively proved the universal property of the function approximation with deep neural networks for diverse models using theories such as ridgelet transform, group action, and representation theory. The ridgelet transform is a framework that represents neural networks and achieves analysis that is not constrained by model characteristics. However, it could only be applied to shallow two-layer models, leaving a gap between theoretical analysis and practical models. In this paper, we extended this framework to deep models using the equivariance of group actions, and by applying theorems from representation theory, we comprehensively demonstrated the universality of various deep models, providing a new direction for the theoretical analysis of neural networks that supports recent AI development.