The goal of our reinforcement learning group, led by Professor Yang Gao, is to pursue an ultimate AI solution which can incrementally learn an adaptive and near optimal policy by interacting with the environment. Our group locates at the Menminwei Building of main campus of Nanjing Univeristy and is affiliated with Computer Science and Technology Department.
Inspired by related psychological theory, in computer science, reinforcement learning is a sub-area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states. In economics and game theory, reinforcement learning is considered as a boundedly rational interpretation of how equilibrium may arise.
The environment is typically formulated as a finite-state Markov decision process (MDP), and reinforcement learning algorithms for this context are highly related to dynamic programming techniques. State transition probabilities and reward probabilities in the MDP are typically stochastic but stationary over the course of the problem.
Reinforcement learning differs from the supervised learning problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been mostly studied through the multi-armed bandit problem.
Yang Gao is a teacher in the Computer Science Department at the Nanjing University. He recieved his B.S. in Dalian University of Technology in 1993, M.S. in Computer Aided Design from the Nanjing University of Science and Technology in 1996, and Ph.D. in Computer Science from the Nanjing University in 2000. He won the IBM China Visitorship Program in 2003 and visited the ETI in HKU from June to September 2003. He visited the University of Western Sydney in Dec, 2004, the Hong Kong Baptist University from Jan., 2005 to Feb., 2005 and Massey University, New Zealand in Dec, 2006.