Achieving the World's Fastest Training Speed! The Latest Deep Learning Technologies to Create a High-Accuracy AI

Research Underway on Deep Learning, a Method of Machine Learning Using Multi-Layer Neural Networks Trained on Large Data Sets

Our exposure to the term deep learning has recently increased. It refers to a technology using a neural network * that is repetitively trained on large data sets. It is also a method of improving recognition and categorization accuracy. Recently, research on deep learning is advancing rapidly, and the technology has achieved higher image, character, and voice recognition accuracy than humans.

In order to improve the accuracy of these processes, the use of large data sets is required in deep learning. To process these large data sets, Graphics Processing Units (GPUs) ** are widely used, which can perform faster calculations than Central Processing Units (CPUs). Neural networks have recently expanded to become multi-layered. However, these expanded networks then require a large amount of time to be trained on large data sets, attracting the attention of technologies that can leverage the power of GPUs. Operating multiple GPUs in parallel to speed up the training process is one example of these technologies.

The number of GPUs that a single computer can have is limited. To use multiple GPUs in neural network training, the computers must be connected with each other over a high-speed network and they must share information during training progresses. Parallel GPU processing, however, requires extra time for inter-computer communication because the computers process information at different speeds and data sharing is complex. Also, GPU memory is smaller than memory in commonly-used computers, restricting the size of neural networks that can be trained at high-speed.

*: A model imitating a human neural circuit. Computers provided with a learning capability can resolve various problems.
**: A key part of a PC or workstation, used for processing images. Recently, use of GPUs as General Purpose GPUs (GPGPUs) for general-purpose computing is gaining attention.

Two Technologies for Achieving Faster Deep Learning and Larger Neural Networks

Fujitsu Laboratories developed the two technologies below to overcome the problems described above.

1. Deep learning high-speed processing technology
This technology processes data differently depending on the order of the training process and the size of data to be shared. In a series of computations, the technology automatically controls the order of data transfer to allow the data necessary for initiating the next training process to be shared by the computers in advance. This shortens the time before the next learning process begins (Figure 1).

Figure 1 Scheduling technology for data sharing

Also, before the computers share the processed data, the technology evaluates the original data size and if necessary automatically distributes it to the computers to perform optimum computation. This minimizes the time required to perform a calculation (Figure 2).

Figure 2 Difference in data processing: when sharing small size data (top) and large size data (bottom)

2. GPU memory efficiency improvement technology
Fujitsu Laboratories also developed a technology to improve GPU memory efficiency. This technology allows a GPU to perform computations for a larger neural network without using the parallel operation model that causes significant reductions in training speed. At the beginning of the training process, the technology analyzes the structure of each neural network layer. It then adjusts the order of computation processes to allow the recycling of a memory areas housing large pieces of data. The technology results in reduced memory usage (Figure 3).

Figure 3 Technology to improve memory efficiency

Achieving the World's Fastest Training Speed! High Accuracy Development Is Now a Reality

Fujitsu Laboratories applied the two technologies described above to the deep learning framework Caffe and measured training time and GPU memory usage.

The results indicated that training time was 27 times faster than single-GPU operations. Compared to the pre-application state, training speed was 46% faster when 16 GPUs were used, and 71% faster with 64 GPUs (compared with a Fujitsu product; world fastest speed; Figure 4). Meanwhile, GPU memory usage was reduced by more than 40% from pre-application operation. The measurement showed that a GPU could process data of a neural network twice as large as before.

Figure 4 Speed increase when using multiple GPUs compared to single-GPU operation

Combining these two technologies allows a neural network to be trained faster and to expand in size. A large neural network is required to perform complex calculations in automated robot or vehicle control, medical (e.g. disease classification) or finance fields (e.g. stock market prediction). These two technologies can leverage a GPU’s ability to deliver high-speed calculations for training. This will shorten the time required for deep learning research and development and accelerating development of a model with higher accuracy and quality.

Fujitsu Laboratories will start incorporating these two technologies into Fujitsu’s AI technology, Human Centric AI Zinrai, from April 2017. We will further improve these two technologies to increase training speed.