Microsoft ends support for Internet Explorer on June 16, 2022.
We recommend using one of the browsers listed below.

Microsoft Edge（Latest version）
Mozilla Firefox（Latest version）
Google Chrome（Latest version）
Apple Safari（Latest version）

Please contact your browser provider for download and installation instructions.

June 25, 2024

NTT Corporation

NTT's algorithm improves the performance of Fugaku's large-scale graph search by about 20%
Contributed to supercomputer performance to rank first in Graph500 for 9 consecutive years

News Highlights:

◆We developed a fast breadth-first search (BFS) algorithm
◆Using Fugaku, the BFS for a graph with approximately 4.4 trillion vertices and 70.4 trillion edges has been accelerated to an average of 0.42 seconds
◆This achievement is expected to improve the performance of a wide range of processing such as data mining and AI using large-scale graph data

TOKYO - June 25, 2024 - NTT Corporation (NTT) has developed the algorithm Forest Pruning, to perform a high-speed calculation for graphs¹ (data showing relatedness of things by vertices and edges) to trace connections of all vertices from the start point in the shortest path (BFS²). This technology has contributed to improving the supercomputer Fugaku's record for first place in the BFS category in the Graph500³ supercomputer performance ranking by approximately 20%. This is expected to improve the performance of various processing such as data mining and AI using large-scale graph data.
　The results of the joint research group, including this technology, will be presented at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC24), a leading conference in the field of high-performance computing, to be held in Atlanta, GA, from November 17-22, 2024.

1. Background

Much of the complex information in the real world is represented in graphs. NTT has been researching ways to process large-scale graphs in a shorter time and with lower power consumption for many years and has devised Forest Pruning — an algorithm that enables efficient BFS. In the Graph500 Green "Big Data" computer ranking announced in November 2023, NTT's graph processing technology including Forest Pruning implemented on a GPU achieved the highest power efficiency among processors on the market. This time, we participated in the joint research group⁴ working on Graph500 on the supercomputer Fugaku, and implemented Forest Pruning for Fugaku, to achieve this result.

2. Key points of the technology

Forest Pruning skips searching the part of the graph that is originally tree-structured. Since the tree-structured portions of the graph appear in a BFS tree in their original form (Figure 1), trees can be discovered and separated in advance, allowing BFS trees to be structured in a short time without having to search for them each time. Separating trees also reduces memory consumption by shrinking the graph. This technique is effective when BFS is performed repeatedly on the same graph from different start points. Although the framework of accelerating subsequent processing through precomputation has been applied to various problems, a method of precomputing partial solutions, such as Forest Pruning, has not been identified in BFS.

Figure 1 Flow of Conventional BFS
Note. With vertex② as a start point. All vertices in the graph are discovered by visiting vertices connected by edges in order of proximity from the start point. Graph500 measures the performance of constructing a BFS tree, which represents the visiting path of the BFS.

The process of Forest Pruning consists of two steps: precomputation and BFS tree construction (Figure 2).

Precomputation: Decompose the graph into a set of trees and a non-tree part, each stored in a different data structure. This process is not included in the performance measurement according to the Graph500 specification.
BFS tree construction: Given a start point, a conventional BFS is performed only in the non-tree part of the graph to construct a partial BFS tree. Then, the tree parts obtained in the precomputation are copied and attached to this partial BFS tree, resulting in the complete BFS tree. The cases are classified according to whether the given start point is included in the set of trees or the non-tree part, and the correct BFS tree is constructed regardless of the choice of start point.

Figure 2 Flow of BFS tree Construction using Forest Pruning

Forest Pruning reduces the process of BFS tree construction by performing precomputation. When BFS is performed repeatedly while changing the start point on the same graph, only the BFS tree construction is performed repeatedly, so that this technique can reduce the overall processing time.

3. Outline of the experiment

In addition to Forest Pruning, a joint research group including NTT implemented a newly developed graph data compression technology in the Graph500 BFS benchmark program for Fugaku. The performance was measured using SCALE 42 and SCALE 43 graphs specified in Graph500 using 152,064 compute nodes⁵ (approximately 96% of the entire system) of Fugaku. Table 1 shows the size (number of vertices and edges) of the graph generated by each SCALE and the performance measurement results.

Table 1 Performance Measurement Results for May 2024
Note. "Improvement" indicates the percentage increase in performance from the November 2023 record of 138,867 GTEPS.

* You can scroll horizontally

	Graph size		Performance
	Vertex count	Edge count	Processing time (seconds)	GTEPS	Improvement
SCALE 42	Approx. 4.4 trillion	Approx. 70.4 trillion	0.42	166,029	+20%
SCALE 43	Approx. 8.8 trillion	Approx. 140.7 trillion	0.71	198,321	+43%

・Results of SCALE 42

The SCALE 42 has improved by approximately 20% from the previous performance of 138,867 GTEPS⁶ announced in November 2023. After examining the contribution of implemented functions, it was confirmed that the improvement is almost entirely attributable to Forest Pruning. This record can be found in Graph500's June 2024 rankings (Table 2).

・Results of SCALE 43

Previously, SCALE 43 could not be executed on Fugaku due to insufficient memory for the graph size. For the first time, measurement is possible thanks to graph data compression technology and the effect of graph reduction by Forest Pruning. This is about 43% better than the performance from November 2023, which is even better than the performance improvement of SCALE 42. Note that validation of obtained BFS trees, which is required by the Graph500 specification, was omitted for SCALE 43 to reduce the time for performance measurement, so the record was not submitted to the ranking.

Table 2 Graph500 BFS June 2024 List

* You can scroll horizontally

Rank	System name	Country	GTEPS
1	Fugaku	Japan	166,029
2	Wuhan Supercomputer	China	115,358
3	Frontier	America	29,655
4	Pengcheng Cloudbrain-II	China	28,463
5	Aurora	America	24,250

4. Outlook

To measure the performance for SCALE 43 in accordance with the Graph500 specification and submit the result, the joint research group is working on optimizing the program and improving an execution method. In the long term, we will improve the performance by incorporating techniques developed by other research teams and develop efficient graph processing technology for GPU-based supercomputers. Through these efforts, NTT will continue to develop technologies for large-scale graph processing and efficient utilization of cutting-edge computer systems, contributing to performance improvement of various computations including data mining and AI.

Presentation information

The details of Forest Pruning and the measurement results of Fugaku, together with other results of the joint research group, will be presented in the following papers at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC24). This year's acceptance rate was 22.7%.
Title: Doubling Graph Traversal Efficiency to 198 TeraTEPS on the Supercomputer Fugaku
Authors: Junya Arai (NTT), Masahiro Nakao (RIKEN), Yuto Inoue (Fixstars), Kanto Teranishi (Fixstars), Koji Ueno (Fixstars), Keiichiro Yamamura (Kyushu University), Mitsuhisa Sato (RIKEN), Katsuki Fujisawa (Tokyo Institute of Technology)
SC24 Website: https://sc24.supercomputing.org/ Open other window

¹Graph: Data that represents the connections of things with vertices and edges. Graphs are an important form of expression in various fields such as AI, security, and infrastructure management due to their flexibility and expressiveness. For example, a route map can be regarded as a graph, with stations as vertices and tracks as edges. Such graphs are useful for determining the proper route to a destination. Graphs representing knowledge are used in AI and other applications (Figure 3). Graphs can also represent a person's behavior, relationships with friends, communication records, and financial transactions (Figure 4), which can be used for video recommendation, cyber-attack detection, and money laundering detection.

Figure 3 Example of the Knowledge Graph
Note. Although it is difficult to discern a relationship between 1985 and Chiyoda Ward, Tokyo at a glance, it is possible to read that "NTT" is between them.

Figure 4 Example of a Graph Consolidating Complex Information
Note. Graphs can represent a variety of information, such as people's behavior and relationships, communication between computers, and financial transactions between accounts.

²Breadth-first search (BFS): A graph search method. With BFS, you can find the shortest path between a vertex (start point) and other vertices. Short paths are useful information in graph analysis that focuses on connections. For example, short routes are preferred when traveling by train or car. In the example shown in Figure 3, if we are interested in the connection between the year 1985 and Chiyoda Ward, Tokyo, the path of 1985--NTT--Chiyoda Ward, Tokyo represents the most direct information. The BFS visits all reachable vertices in the graph by repeatedly visiting the vertices that are directly connected by a start point and edges, and then visiting the vertices that are connected to them (see Figure 1). A BFS visit path is represented as a BFS tree.

³Graph500: A ranking of supercomputer performance. Given the importance of graph analysis, it was established in 2010 as a ranking based on BFS performance. Graph500 shows the performance of a BFS tree structure through a BFS using an index called TEPS6. Benchmark problems were later added to Graph500, and now there are rankings in three categories: BFS, SSSP (Single Start Shortest Path), and Green (BFS Power Efficiency).
Website: https://graph500.org/ Open other window

⁴Joint research group: RIKEN, Tokyo Institute of Technology, Fixstars Corporation, and NTT. NTT announced our participation in November 2023.

⁵Computation node: A computer unit that comprises a CPU and memory.

⁶TEPS, GTEPS (Giga TEPS): Traversed edges per second (TEPS) is a performance metric used in Graph500. In short, TEPS is the number of edges in the graph divided by the processing time. GTEPS represents 1 billion TEPS.

About NTT

NTT contributes to a sustainable society through the power of innovation. We are a leading global technology company providing services to consumers and businesses as a mobile operator, infrastructure, networks, applications, and consulting provider. Our offerings include digital business consulting, managed application services, workplace and cloud solutions, data center and edge computing, all supported by our deep global industry expertise. We are over $97B in revenue and 330,000 employees, with $3.6B in annual R&D investments. Our operations span across 80+ countries and regions, allowing us to serve clients in over 190 of them. We serve over 75% of Fortune Global 100 companies, thousands of other enterprise and government clients and millions of consumers.

Media contact

NTT Service Innovation Laboratory Group
Public Relations
nttrd-pr@ml.ntt.com

Information is current as of the date of issue of the individual press release.
Please be advised that information may be outdated after that point.

Back to Press Release