Project members are Devika Subramanian and Peter Druschel . The work is supported in part by a grant from Southwestern Bell.

The Question

How do we design distributed reinforcement learning algorithms for environments that are not stationary? How do we demonstrate their effectiveness, both theoretically and experimentally? We study these questions in the context of designing routing algorithms that compute paths in dynamic networks. Such networks exhibit two types of non-stationarity: topology changes caused by link failures and link revivals that occur on faily short time scales, and longer time scale effects caused by congestion which alters costs of links in the network. How can one make reinforcement learning algorithms that respond to both types of non-stationarities?

The Approach

The traditional distance-vector algorithm used for Internet routing performs Bellman-updates similar to a standard reinforcement learning algorithm (with learning rate set to 1 to handle non-stationarities of the first kind). We call such algorithms indirect path-sampling algorithms. We introduce a new family of distributed reinforcement learning algorithms that learn path costs in the network by direct sampling. Direct sampling handles non-stationarity of both types well. Unfortunately, direct sampling as described here is not competitive with the distance-vector algorithm in routing cost. Selective direct path sampling described here performs independent, time-staggered direct path cost updates. It is superior both in overheads and route quality over today's routing methods. It is one of the first routing techniques that is stable under dynamic cost metrics, allowing for splits in congested traffic that cannot be made by traditional routing algorithms.

The Result

Publications

Talks