Reinforcement learning for non-stationary environments

Project members are Devika Subramanian and Peter Druschel . The work is supported in part by a grant from Southwestern Bell.

The Question

How do we design distributed reinforcement learning algorithms for environments that are not stationary? How do we demonstrate their effectiveness, both theoretically and experimentally? We study these questions in the context of designing routing algorithms that compute paths in dynamic networks. Such networks exhibit two types of non-stationarity: topology changes caused by link failures and link revivals that occur on faily short time scales, and longer time scale effects caused by congestion which alters costs of links in the network. How can one make reinforcement learning algorithms that respond to both types of non-stationarities?

The Approach

The traditional distance-vector algorithm used for Internet routing performs Bellman-updates similar to a standard reinforcement learning algorithm (with learning rate set to 1 to handle non-stationarities of the first kind). We call such algorithms indirect path-sampling algorithms. We introduce a new family of distributed reinforcement learning algorithms that learn path costs in the network by direct sampling. Direct sampling handles non-stationarity of both types well. Unfortunately, direct sampling as described here is not competitive with the distance-vector algorithm in routing cost. Selective direct path sampling described here performs independent, time-staggered direct path cost updates. It is superior both in overheads and route quality over today's routing methods. It is one of the first routing techniques that is stable under dynamic cost metrics, allowing for splits in congested traffic that cannot be made by traditional routing algorithms.

The Result

Publications

A new approach to routing using dynamic metrics, in Proceedings of INFOCOM 1999 (with P. Druschel and J. Chen).
An efficient multi-path forwarding method , in Proceedings of INFOCOM 1998, (with P. Druschel and J. Chen).
A simple, practical, distributed multipath routing algorithm , TR98-320, July 1998, Rice University (with J. Chen and P. Druschel).
Ants and reinforcement learning: a case study in routing in dynamic networks , TR96-259, Rice University (updated July 1998) (with J. Chen and P. Druschel).
Ants and reinforcement learning: a case study in routing in dynamic networks , in Proceedings of IJCAI, 1997, (with P. Druschel and J. Chen).

Talks

Reinforcement learning in non-stationary environments, July 1999, invited talk at AAAI Workshop on Distributed Systems in AI.