Microsoft ends support for Internet Explorer on June 16, 2022.
We recommend using one of the browsers listed below.

  • Microsoft Edge(Latest version) 
  • Mozilla Firefox(Latest version) 
  • Google Chrome(Latest version) 
  • Apple Safari(Latest version) 

Please contact your browser provider for download and installation instructions.

Open search panel Close search panel Open menu Close menu

January 20, 2026

AI That Doesn't Forget

An AI system trained to help doctors shouldn’t lose its bedside manner every time its underlying model gets upgraded. In the same way, a customer-service chatbot shouldn't forget its tone or vocabulary just because a faster version of its base system has come out. But that’s exactly what happens today. Each time a foundation model is replaced, every specialized version built on top of it has to be retrained, an expensive, energy-sucking process that starts the learning cycle again from zero.

Portable Reward Tuning

Think how much better it would be if that training could be reused. If the expertise, style, and safety rules taught to an older AI could be passed on to a new one, almost like handing down a trusted recipe that always works. That’s the intention behind Portable Reward Tuning (PRT), a new method developed by researchers at NTT Computer and Data Science Laboratories and NTT Human Informatics Laboratories. It allows an AI system’s know-how to survive beyond model generations, reducing the cost and carbon footprint of keeping artificial intelligence up to date.

Fine-Tuning

Fine-tuning, which is when a large model is adapted to a specific purpose, is a core part of modern AI. A foundation model such as GPT for language or CLIP for image recognition starts as a generalist and is trained on huge amounts of data. To make it more useful for a specialized task, developers then fine-tune by training it on additional, specific examples, until it behaves as required.

The only problem with doing that is the speed with which foundation models are replaced by newer versions. It means that fine-tuning has to be done all over again; the original training can't be carried forward, because each model’s internal structure is different. For organizations using customized AI, this can mean big costs, long delays, and duplicated energy use.

Portable Reward Tuning changes that paradigm by rethinking what fine-tuning really is (or should be). Instead of adjusting the inner workings of a model, NTT’s concept focuses on the desired outcome. It involves training a smaller “reward model” that learns to recognize good performance.

Reward Models Learning The Rules

Reward model? Think of it as a kind of referee that scores the AI’s answers instead of producing them. During training, it learns to recognize what a good answer looks like, whether that means a correct translation, a polite response, or an accurate image label. Once the referee knows the rules, it's able to judge any player. So when a new foundation model takes the field, it doesn’t need to be taught everything again; it just follows the same scoring system.

Once the reward model is trained, it can be reused with newer, faster foundation models that share the same type of output, such as words or image labels. The new model then generates answers that score highly under the same reward system, effectively inheriting the older model’s expertise without the need for top-to-bottom retraining. The reward model doesn’t depend on how the foundation model is built. It only cares about its outputs, not its internal wiring.

Similiar Accuracy, Much Less Effort

NTT’s research paper, presented at the July 2025 International Conference on Machine Learning held in Vancouver, Canada, showed that PRT achieved nearly the same accuracy as full-scale fine-tuning, but using much less computation and memory. It worked across both language and visual tasks, and between architectures that normally require separate retraining. The research team described the approach as “training the teacher instead of the student.” Because the teacher, the reward model, can work with different students, it allows knowledge to be transferred between model generations.

Upgrades!

What does this mean for you and me? For ordinary users, although the benefits may be invisible, they will nevertheless be meaningful. Chatbots will be able to retain their conversational style, even after a system upgrade. Translation apps can stay consistent, instead of relearning tone or phrasing. Businesses could update their AI services without losing months of retraining time. And because far less computing power is required, Portable Reward Tuning could make AI development cleaner and more sustainable.

It's a glimpse of something we don't often see these days in technology: continuity. In a world where software constantly replaces itself, the development of PRT whispers gently that intelligence doesn’t have to start over each time. It can evolve, remember, and improve, without forgetting where it came from.

Innovating a Sustainable Future for People and Planet

For further information, please see this link:
https://group.ntt/en/newsrelease/2025/07/09/250709a.html

If you have any questions on the content of this article, please contact:
Public Relations
NTT Service Innovation Laboratory Group
https://tools.group.ntt/en/rd/contact/index.phpOpen other window

Picture: Daniel O'Connor

Daniel O'Connor joined the NTT Group in 1999 when he began work as the Public Relations Manager of NTT Europe. While in London, he liaised with the local press, created the company's intranet site, wrote technical copy for industry magazines and managed exhibition stands from initial design to finished displays.

Later seconded to the headquarters of NTT Communications in Tokyo, he contributed to the company's first-ever winning of global telecoms awards and the digitalisation of internal company information exchange.

Since 2015 Daniel has created content for the Group's Global Leadership Institute, the One NTT Network and is currently working with NTT R&D teams to grow public understanding of the cutting-edge research undertaken by the NTT Group.