This project is funded in part by NASA. Project members are Devika Subramanian , Dave Kortenkamp and Pete
Bonasso.
The problem
Coupled dynamical systems are an interesting, commonly-occurring class
of systems with the property that the behavior of their subsystems is
deterministic and easily derivable in isolation; yet it is impossible
to analytically predict the behavior of the overall system. Such
systems are known to be sensitive to initial conditions and are
typically studied using numerical simulations. In this paper, we
experimentally study reinforcement learning techniques to control the
Mars BioPlex ; an advanced life
support system which is a representative example of a coupled
dynamical system. It is difficult to design tractable formulations of
reinforcement learning for such a system because we have no a priori
knowledge of system dynamics.
The approach
We present a two-step method for acquiring good control policies for
dissipative coupled dynamical systems, i.e., ones with finite
lifetimes. We first learn a ``short'' open loop control plan using
genetic algorithms and apply this plan repeatedly to maximize
the given control system objective. This open loop plan provides
insight into the topology of the system state space. It identifies the
system as having a small core of "safe" states. A safe system state is
one in which all component subystems are functioning normally. The
repeated execution of the open loop plan keeps the system state in a
periodic trajectory through the safe core for as long as it is
feasible. This observation guides the design of a reduced state and
action space, as well as an informative local reward function. Armed
with this new and effective formulation, a Q learner finally acquires
an optimal closed-loop control policy for the system.
Papers