97-15   Berichtsreihe des Mathematischen Seminars der Universität Kiel

Stephan Pareigis:

Numerical Schemes for the Continuous Q-function of Reinforcement Learning

We develop a theoretical framework for the problem of learning optimal control. We consider a discounted infinite horizon deterministic control problem in the reinforcement learning context. The main objective is to approximate the optimal value function of a fully continuous problem, using only observed information as state, control, and cost. With results from the numerical treatment of the Bellman equation we formulate regularity and consistency results for the optimal value function. These results help to construct algorithms for the continuous problem. We propose two approximation schemes for the optimal value function which are based on observed data. The implementation of a simple optimal control learning problem shows the effects of the two approximation schemes.

Mathematics Subject Classification (1991): 49L20, 65N30, 68T05, 93C57

Keywords: learning optimal control, dynamic programming, reinforcement learning, sampled data, approximation of optimal value function.


Mail an Jens Burmeister
[Thu Feb 19 18:56:34 2009]
Impressum