Microsoft ends support for Internet Explorer on June 16, 2022.
We recommend using one of the browsers listed below.

  • Microsoft Edge(Latest version) 
  • Mozilla Firefox(Latest version) 
  • Google Chrome(Latest version) 
  • Apple Safari(Latest version) 

Please contact your browser provider for download and installation instructions.

Open search panel Close search panel Open menu Close menu

April 26, 2024

NTT Corporation

World's first: Regression analysis of disparate data that expands the applicability of data analysis
Expanding the application area of data analysis by deep learning with "universal approximation capability"

Tokyo - April 26, 2024 - NTT Corporation (NTT) has realized a new data analysis method that uses deep learning to estimate functions that represent input-output relationships from disparate data without correspondence between input and output variables. By using this method, it is possible to estimate a function that represents the relationship between input and output and analyze the data collected in different departments, organizations, or groups in which individuals cannot be identified. The results were presented at the 38th AAAI Conference on Artificial Intelligence (AAAI-24)*, the premier international conference in the field of artificial intelligence, held in Vancouver, Canada, from February 20 to 27, 2024.
*https://aaai.org/aaai-conference/Open other window

1. Background of the study

With the spread of smartphones and wearable devices, a wide variety of data on human conditions and behavior has been accumulated. However, it is difficult to collect comprehensive data due to observation costs and privacy protection, and there are still many situations where it is difficult to utilize the data. For example, as shown in Figure 1, in companies that post information about products and services and introductory videos on online sites and sell products in physical stores, information about online visitors (e.g., browsing time) and information about the purchasers of physical stores (e.g., purchase price) are often obtained separately, and the correspondence between the information of the visitors and the purchasers is unknown (the correspondence between browsing time and purchase price). Similarly, if sensitive information, such as test scores, is collected in groups by class for privacy, the correspondence between student attributes as input and scores as output will be unknown. NTT has implemented a method that enables the estimation, prediction, and control of human condition and behavior using realistically collectible data, such as trajectory data with limited observation range and aggregate location data in crowd units. Since standard regression analysis methods cannot be applied to the analysis of disparate data without correspondence, a technique was desired to make the analysis possible.

2. Results of the research

Existing techniques for analyzing data without correspondence assumed that the function representing the input-output relationship is linear (e.g., purchase amount increases in proportion to browsing time, etc.), which is a very strong constraint. As a result, it was not possible to estimate the non-linear relationships that exist in many real-world data, limiting its applicability to a limited number of situations. This technique makes it possible to estimate arbitrary functions by taking advantage of deep learning with its "universal approximation capability1". This allows you to analyze the data by estimating a non-linear function, such as the peak of purchase amount at a certain browsing time.

Figure 1 Examples of Discretely Obtained Data Figure 1 Examples of Discretely Obtained Data

Figure 2 Examples of Proposed and Existing Technology Estimation Results Figure 2 Examples of Proposed and Existing Technology Estimation Results

3. Key points of the technology

NTT has developed a new method that makes it possible to estimate the regression function from the data without correspondence in a way different from existing techniques. There are two main points.

  1. To efficiently generate a set of highly probable candidate correspondences between input and output to approximate the cost function. Since there are a total of factorials of the number of possible correspondences, it is difficult to handle all of them. In contrast, NTT showed that a set of plausible candidates can be obtained by rearranging the order of the elements, and succeeded in deriving an approximate cost function (a function to be minimized for parameter estimation) by weighting each candidate by the probability that it is an actual correspondence.
  2. Minimization of cost function using stochastic gradient method
     The cost function is minimized using the stochastic gradient method2, which is widely used for parameter estimation in neural networks, is used to minimize the cost function derived above. This prevents a neural network with many parameters from getting stuck in a local solution and allows it to reach a better solution.

4. Outlook

To further expand the application areas of data analysis, we will continue to establish and study the application of technologies that enable data analysis for humans in more realistic situations, such as when data are biased or subject to individual differences.

1Universal approximation capability
The ability to approximate any function belonging to a very broad class, such as a continuous function, with arbitrary accuracy by using a neural network.

2Stochastic gradient method
Stochastic Gradient Descent and Adam are the leading methods

About NTT

NTT contributes to a sustainable society through the power of innovation. We are a leading global technology company providing services to consumers and businesses as a mobile operator, infrastructure, networks, applications, and consulting provider. Our offerings include digital business consulting, managed application services, workplace and cloud solutions, data center and edge computing, all supported by our deep global industry expertise. We are over $97B in revenue and 330,000 employees, with $3.6B in annual R&D investments. Our operations span across 80+ countries and regions, allowing us to serve clients in over 190 of them. We serve over 75% of Fortune Global 100 companies, thousands of other enterprise and government clients and millions of consumers.

Media contact

NTT Service Innovation Laboratory Group
Public Relations
nttrd-pr@ml.ntt.com

Information is current as of the date of issue of the individual press release.
Please be advised that information may be outdated after that point.