Overview of Deep Learning―The Frontier of Machine Learning Methods and the Core of AI―

Recently, the term “deep learning” is often seen in newspapers and on TV. Some may have heard of it but remain unsure of what it can actually do and what it is used for, while others may have tried to look it up but could not understand it well due to the large number of mathematical expressions. This column explains the basics of deep learning and some applications with specific examples.

Author's profile
Fumitaka Sato
Senior Consultant
Business Analytics Group, Fujitsu Research Institute
An AI utilization consultant who solves various issues by analyzing data from the perspective of a data scientist.
Author's profile
Kazuma Horita
Business Analytics Group, Fujitsu Research Institute
Develops modeling systems for in-house use of natural language, image, and video data primarily in the manufacturing industry.

* This article was published in Chisounomori 2018, volume 2 on March 19, 2018. Chisounomori is an information magazine published by Fujitsu Research Institute (FRI).
* The department and position of the author and the content are from the time of publication.

What Is Deep Learning?

First, consider a simple definition of deep learning. Deep learning is a multi-layer construction composed of neural networks, or machine learning methods (algorithms). Machine learning is a technique to give computers capabilities to learn like humans. Many systems with “AI” in their names are built around such capabilities. (The term “machine learning” is not always used.) Machine learning helps computers to discern the features that are hidden in data; to behave as if they have grown by themselves; to handle new, unprecedented input; and to occasionally generate output that exceeds the human senses. Machine learning itself is an old technique, whereas deep learning has pushed the limits of machine learning significantly. Figure 1 shows the relations among AI, machine learning, deep learning, and other concepts that are often confused.

Figure 1 Relations among AI, machine learning, neural networks, and deep learning
Machine learning: a component technology of AI
Neural network: an algorithm for machine learning
Deep learning: a multi-layer structure composed of neural networks

What Are the Algorithms Like?

Quick question--what do you think of when you view the image below? Most likely, you subconsciously recognize it as a zebra based on its features, including the black and white stripes, mane, and large ears (Figure 2).

Figure 2 Example of an image to recognize (zebra)
An image of a zebra. Many people subconsciously recognize that this image depicts a zebra upon seeing it. In fact, this recognition process occurs in a network of nerve cells known as neurons.

Before someone subconsciously recognizes a zebra, the following process actually takes place.

  1. 1. The brain first sees a zebra in an illustrated encyclopedia, on TV, at a zoo, or elsewhere.
  2. 2. The brain extracts some features, such as the black and white stripes, mane, and large ears; it then memorizes that the subject is a zebra.
  3. 3. Upon seeing Figure 2, the brain compares the image's features with various features of objects it has memorized before, associates the subject in the image with the most similar one, and determines that the subject is a zebra.

In reality, the human brain carries out this process with a network of nerve cells known as neurons. There are at least 1010 (10 raised to the power of 10) neurons in the human brain. Each neuron is connected to other neurons through junctions called synapses. A neuron fires when an electrical input signal exceeds a certain threshold, and then outputs an electrical signal to the next neuron via a synapse. The brain communicates signals through a series of such actions, thereby implementing various thought processes, including recognition of a zebra. In a similar way, a neural network, which is a type of machine learning logic, emulates in-brain neural circuits and their process via computers.

Consider the example of recognizing a zebra as described above. Here, determining whether the subject is a zebra or not (1 for zebra, otherwise 0) involves inputs of three features: “black and white stripes: x1,” “mane: x2,” and “large ears: x3.” Inputs (the plausibility of each feature is represented by values ranging from 0 to 1) are multiplied by their respective weights (the importance of each feature in recognizing a zebra is represented by values ranging from 0 to 1) and then summed up. The output is 1 if the sum exceeds the threshold; otherwise, it is 0. For example, assume that the feature inputs are x1 = 0.9, x2 = 0.6, and x3 = 0.8, and the weights are w1 = 0.8, w2 = 0.5, and w3 = 0.6. Calculate 0.9*0.8 + 0.6*0.5 + 0.8*0.6 = 1.5 and compare the solution with threshold θ. When threshold θ is 1.0, 1.5 (the solution) is larger than 1.0 (the threshold); thus, the output is 1 (the image depicts a zebra) (Figure 3). Although here the weight values that represent the inputs' importance are defined as w1 = 0.8, w2 = 0.5, and w3 = 0.6, when using an actual neural network, appropriate weights to recognize a zebra are statistically derived from a large number of zebra images. The process of optimizing weights for neural networks based on input data is called learning.

Figure 3 Neural network-based image recognition logic
This shows the process of recognizing a zebra by a neural network, which is a type of machine learning method. The determination of whether or not the subject is a zebra involves the inputs of three features, namely black and white stripes, the mane, and large ears; the inputs are multiplied by their respective weights and summed up. If the sum exceeds the threshold, the target image is determined to depict a zebra.

Thus, a neural network, which is a type of machine learning logic, emulates in-brain neural circuits and their process via computers. Multi-layer combinations of neural networks are collectively called deep learning. In the zebra example, the values of the features and their inputs are already defined as examples. The concept of deep learning is to train neural networks to determine which features the inputs indicate and how plausible these indications are. This is what drastically differentiates deep learning from conventional image recognition methods that involve manual designing of the extraction of images' useful features. Repeating this process leads to the capability to determine whether or not an image depicts a zebra based on pixelated information. In addition, forming an output layer composed of several output nodes as shown in Figure 4 enables the neural network to solve complex problems, such as identifying whether the subject is any of a number of animals (e.g., a zebra or cat).

Figure 4 Example deep learning structure
Stacking neural networks in the middle layer makes it possible to recognize a zebra based on pixelated information. In addition, forming several output nodes as shown above enables the neural network to solve complex problems, such as identifying whether the subject is any of a number of animals (e.g., a zebra or cat).

AI Progressing in Various Areas, Including AI Go, Image Recognition, and Language Processing

Clearly, the development of deep learning techniques has pushed the limits of conventional machine learning significantly and driven the third AI boom. Traditionally, computers' primary role has been to execute predefined processes quickly. Today, computers as AI are increasingly expected to substitute for human judgment in some areas, and sometimes to discover events that people have failed to notice.

Perhaps many still vividly remember the news that AlphaGo, the AI go software program developed by DeepMind, soundly beat a top-tier go player. One may safely say that people can no longer defeat machines in intellectual games that have predefined rules. The AlphaGo program uses deep learning to learn the records of 160,000 games that have occurred between professional go players for a total of 30 million board positions.

As for image recognition, the Hinton group at the University of Toronto used an eight-layer convolutional neural network in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012, a contest regarding image recognition accuracy, and won in the category for recognition of 1,000 object classes by having the lowest error rate; their victory had a significant margin of more than 10%. Thereafter, image recognition methods swiftly shifted from conventional approaches focused on manual feature designing to neural network-based methods. Finally, in 2015, the recognition rate of an image recognition method exceeded that of human beings.

In language processing, Google Translate was updated to use a deep learning-based translation model, which dramatically improved the accuracy. The difference is evident when actually using the software. The old version inevitably generated unnatural translations, whereas the updated version can generate truly natural translations. This is achieved by training a multi-layer recursive neural network with an immense number of pairs of original and translated documents instead of using an original dictionary-based, rule-based, and classical machine learning approach-based translation model.

Deep learning's capabilities are also harnessed in the area of voice authentication. In this field, deep learning has been used to extract the features of sound waves and to predict text since the 2010s; it has successfully contributed to reducing the false recognition rate by more than 10%. Currently, the false recognition rate has been reduced to about 5% in some circumstances. This figure is equal to that of human beings.
As mentioned above, the learning capabilities of complex deep learning networks have now sufficiently developed not only to defeat people in games with predefined rules but also to rival the capabilities of people in solving problems that people handle by recognizing and judging using their senses because the rules cannot be defined clearly. In other words, deep learning can already reproduce the exact behavior of AI.

Deep Learning's Essence Is “Development of Data Analysis”--Further Trial and Error Required

The preceding sections presented an overview of deep learning and the innovations it produces. Today, AI is enjoying an unprecedented boom, and greater expectations are being placed on AI as can be seen in the predictions that AI will surpass the capabilities of human beings and replace a number of jobs. In business situations, some people just want to try AI, whereas others are vaguely expecting to achieve something remarkable by training deep learning models with in-house data.

Deep learning is truly achieving data utilization innovations: its scope of application is wide and its evolution is ongoing. That said, many situations actually do not require deep learning, and to properly build deep learning models that solve actual problems requires expertise in advanced math, statistics, information engineering, and so forth. In fact, the process carried out by “deep learning” completely differs from problem to problem. The aforementioned examples of go, image recognition (Figure 5), translation (Figure 6), and voice recognition use neural networks that have totally different structures. For example, the image recognition model uses a combination of a convolutional layer that extracts features in an area of an image and a pooling layer that summarizes those features (Figure 4). The translation model uses a recursive neural network to process strings of words in a document as chronological data and is combined with layers that encode and decode massive word spaces (Figure 5).

Figure 5 GoogLeNet model that won ILSVRC2014 in the category of image classification problems
The model is built with by combining several layers including a convolutional layer (blue), pooling layer (red), and full connected layer (yellow).
Source: Going deeper with convolutions
Figure 6 Network model used by Google Translate
The model is built by combining a layer that memorizes advance information, a layer that learns the demanded portions of the document, and a layer that reduces the dimensions.
Source: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

The reason why deep learning can solve problems is not because of the increased number of middle layers. The large number of neural network layers stacked together have different roles (e.g., extracting features, reducing the dimensions, memorizing advance information, and evaluating errors). Configurations and parameters must be carefully designed to be in line with input data and the type of problem to solve. It is impossible to acquire such know-how overnight. Casually training a deep learning model with existing data will not yield good results. Without an appropriate network configuration and appropriate parameter settings, any deep learning model will yield worse results than existing machine learning methods. (*) If you want to utilize the latest AI trends and deep learning techniques, before starting to tackle them, please note that they are essentially advanced forms of data analysis; we human beings must still determine the data characteristics and carry out trial and error.

That said, deep learning continues to expand conventional computers' capabilities, thus offering us new experiences, and its possibilities will expand still further. Deep learning is not only developing merely as a research field of machine learning algorithms; it is also supported by the evolving computing power of computers and increased amounts of training data that have resulted from the development of the Internet. Some predict that the technological singularity will occur by 2045. Whether that happens or not, this area is evolving dramatically. We cannot but hope that deep learning continues to provide us with a wide range of innovations.

  • (*)Fujitsu Research Institute has conducted many R&D projects on deep learning-based language processing and image recognition techniques, and we have applied them to many businesses. We provide data utilization consulting services from the perspective of experienced data scientists.