# MDP

Markov Decision Problems (MDP) are models for optimizing decisions in multiple stages, as known as Dynamic Optimization. Stages can be induced by equi-distant time steps like hours or days or any other events like the appearance of requests. Essentially, MDPs are dynamical systems in discrete time plus an opportunity to influence the system by a control. The system dynamics my be inherently uncertain, e.g., the transition from one state to another state in a stage happens according to a given probability distribution. A characteristic feature of MDPs is that this probability distribution must be a function of the current state and the current control. Since the development of the system in future stages is not exactly foreseeable, an optimal solution to an MDP is not an optimal sequence of controls (as known as an open-loop policy) but a sequence of functions, mapping in each stage the current state to a control (known as a closed-loop control or a feed-back control). MDPs can be investigated for a finite or an infinite number of stages. There can be an additional source of uncertainty when the parameters of the MDP itself are not known exactly in advance. Most of the times, this situation can be transformed into an ordinary MDP at the cost of far more complicated stat spaces.

# Markov Decision Problems

**Coordination:**Prof. Dr. Jörg Rambau

**Contact:**Prof. Dr. Jörg Rambau

**Represented in MODUS since**2009/01

**Members with experience in this area:**

**The method "Markov Decision Problems":**

## What is this?

Markov Decision Problems (MDP) are models for optimizing decisions in multiple stages, as known as *Dynamic Optimization*. Stages can be induced by equi-distant time steps like hours or days or any other events like the appearance of requests. Essentially, MDPs are dynamical systems in discrete time plus an opportunity to influence the system by a control. The system dynamics my be inherently uncertain, e.g., the transition from one state to another state in a stage happens according to a given probability distribution. A characteristic feature of MDPs is that this probability distribution must be a function of the current state and the current control. Since the development of the system in future stages is not exactly foreseeable, an optimal solution to an MDP is not an optimal sequence of controls (as known as an *open-loop policy*) but a sequence of functions, mapping in each stage the current state to a control (known as a *closed-loop control* or a *feed-back control*). MDPs can be investigated for a finite or an infinite number of stages. There can be an additional source of uncertainty when the parameters of the MDP itself are not known exactly in advance. Most of the times, this situation can be transformed into an ordinary MDP at the cost of far more complicated state spaces.

## What is it good for?

Whenever decisions have to be made over time again and again and the knowledge about the system increases stage by stage, then MDPs are in place. For example, for managing a ware-house, one has to decide each evening how much to order for replenishment. Although this decision has to be made in the presence of uncertain demand on the next day, the ordering decision will most certainly depend on what is currently left in stock. Or in beach-volleyball: whether or not I will take the risk to play a very hard smash will be dependent on the quality of the set right before.

## Where have we applied it?

The members of MODUS have applied this method for the following challenges:

- Optimal dispatching of pallet elevators in factories
- Optimal inventory management in multi-echelon warehouse networks
- Optimal strategies in sports games