Reinforcement Learning: AI Technology Forms an Optimal Action Selection Policy Based on Rewards
Machine learning, which creates a variety of task executors based on data, has moved forward in practical terms in the areas of image and voice recognition, and now forms the core of artificial intelligence (AI) technology. In image and voice recognition, a classifier is formed based on training data showing the appropriate output (correct recognition results). This type of machine learning is called "supervised learning"
Also in the spotlight recently is reinforcement learning, which can be used where correct outputs are not explicitly presented. In reinforced learning, a computer forms an optimum action selection policy suited to the environment through trial and error, based on reward signal that indicates the goodness actions. With reinforcement learning techniques to date, however, the designers had to specify which sensory features to use, and the learning process had to be done separately for each problem. These problems limited the applicability of reinforcement learning in the real world.
The human brain is capable of learning applied skills where it can select what information is important from different kinds of tasks, apply knowledge and skills that were previously learned to new problems, and select a behavior as needed from among those suited to a particular situation, or that have a greater degree of certainty and safety. For example, a person walking in a crowd can instantly identify obstacles in the direction they wish to take, and avoid collisions. Moreover, it is possible for a chess player to make an appropriate choice according to the situation, depending on if a standard move should be played, or if a move based on deeper thinking is required. In this way, the human brain can instantly select important information from different kinds of information and choose behavior that is safe and certain based on past learning experiences.
Fujitsu and OIST Begin Research on Reinforcement Learning Algorithms Utilizing the Latest Insights in Neuroscience
The Okinawa Institute of Science and Technology Graduate University (OIST) and Fujitsu Laboratories Ltd. have commenced joint research to develop reinforcement learning algorithms with human-like applied skills, leveraging the latest insights in neuroscience.
The research partners will look at how the human brain learns, and incorporate those mechanisms into reinforcement learning algorithms, with the goal of producing an AI with human-like applied skills to tackle a wide range of real-world problems. They will work on developing reinforcement learning algorithms with greater applied skills for creating an AI that can autonomously adjust itself, unlike earlier AI that needed human intervention.
Specifically, the plan is to develop the following three new technologies:
1. The automatic extraction of relevant information for reinforcement learning from dynamic, high-dimensional data,
2. Transfer learning to utilize past experience for creating an action selection policy for a novel problem,
3. Cooperative-concurrent reinforcement learning technology to select an appropriate one of multiple action policies depending on the situation.
Professor Kenji Doya of OIST and his research team will focus on the mathematical modeling of neural computation architectures from a neuroscience perspective, and apply that to reinforcement learning algorithms. Fujitsu Laboratories will jointly develop algorithms based on an optimization and control engineering perspective, and investigate implementation methods that fully leverage computing resources.
Developing AI Solutions with Adaptability and Flexibility of the Human Brain
Currently, OIST and Fujitsu Laboratories are working on the problems of handling high-dimensional input data, and selecting actions from multiple policies, such as those that adapt quickly to changes in the environment or learn more conservatively.
As the research makes further progress, the parallel learning and control methods inspired by the brain architecture would allow efficient management of data centers, for example, by controlling the load of individual computers and operation of air conditioners for energy efficiency and cost reduction.
Fujitsu Laboratories believes that such fusion with neuroscience is the key to further enhancing reinforcement learning and other forms of AI technologies. In the future, by incorporating human-brain-like adaptability and flexibility, Fujitsu Laboratories aims to develop AI solutions that can solve problems more efficiently in various fields, such as ICT system and energy management.