This project is funded in part by NASA. Project members are Devika Subramanian , Dave Kortenkamp and Pete Bonasso.

The problem

Coupled dynamical systems are an interesting, commonly-occurring class of systems with the property that the behavior of their subsystems is deterministic and easily derivable in isolation; yet it is impossible to analytically predict the behavior of the overall system. Such systems are known to be sensitive to initial conditions and are typically studied using numerical simulations. In this paper, we experimentally study reinforcement learning techniques to control the Mars BioPlex ; an advanced life support system which is a representative example of a coupled dynamical system. It is difficult to design tractable formulations of reinforcement learning for such a system because we have no a priori knowledge of system dynamics.

The approach

We present a two-step method for acquiring good control policies for dissipative coupled dynamical systems, i.e., ones with finite lifetimes. We first learn a ``short'' open loop control plan using genetic algorithms and apply this plan repeatedly to maximize the given control system objective. This open loop plan provides insight into the topology of the system state space. It identifies the system as having a small core of "safe" states. A safe system state is one in which all component subystems are functioning normally. The repeated execution of the open loop plan keeps the system state in a periodic trajectory through the safe core for as long as it is feasible. This observation guides the design of a reduced state and action space, as well as an informative local reward function. Armed with this new and effective formulation, a Q learner finally acquires an optimal closed-loop control policy for the system.

Papers