# Iterative Dynamic Programming and Hybrid Model Identification

This page summarizes the work in my dissertation on the iterative dynamic programming (IDP) algorithm for solving the optimal control problem and a new technique for hybrid mechanistic/empirical dynamic model identification. It is also meant to be an informal and very brief introduction to the concept of optimal control, IDP, and hybrid model identification readable by a general engineering audience. There are animations here that obviously can't be included in the paper dissertation, so it is a sort of animated Cliff's Notes for my dissertation. The best way to read it is to open the dissertation PDF in another window so that you can jump back and forth, because there is no way to include all the possible details on a web page.

## Definitions

First, some definitions:

- System

The thing we are interested in (ie. a reactor tank, a distillation column, etc). - Model

A mathematic description of the system. I will be talking about dynamic models formulated as sets of ODEs or DAEs. - Control

A system variable you can change directly (ie. an inlet feed rate, current to a heater, etc). - State

A variable in the model that is not controlled and represents an intrinsic characteristic of the system (ie. temperature, concentration, etc). - Objective

A mathematical term that you want to maximize or minimize (ie. product mass flow rate for a reactor).

## Optimizing Static Controls

Optimizing a static control is relatively easy.

An example is to find the baseline heater electrical current needed to maintain a certain temperature in continuous reactor which maximizes product yield. Assuming no state constraints, Newton's method or other basic optimization technique can be used.

A slightly more complicated example is to find the baseline electrical current needed to get the maximum conversion for the same reactor with some limit of side reaction products. Because state constraints are involved, the problem becomes a linear or nonlinear program (LP or NLP) and again there are many existing libraries to solve these problems easily.

## Optimizing Dynamic Controls

When controls are not constant, finding the optimal control can be much more difficult. A couple of examples of dynamic controls are a batch reactor feed rate that can change with respect to time, or a catalyst concentration that can change with respect to the length along the tube and the radius from the tube wall.

## The Optimal Regulator Problem

One type of problem is the optimal regulator problem: choose the values of the controls in order to maintain the states at some specified value. For linear ODE models, the optimal regulator problem has been solved for several different cases (no disturbances, measured disturbances, unmeasured disturbances) by using the Riccati transformation. For the case of no disturbances or measured disturbances the solution takes the form of a proportional controller, and for the case of unmeasured disturbances the solution takes the form of a proportional-integral controller.

## The Optimal Control Problem

The optimal regulator problem assumes that you know some of what you want beforehand, such as "the concentration of acetate in the reactor should be maintained at 1M," or "the temperature of the reactor should be maintained at 100C." But what if you don't know anything except that you have some objective such as "maximize production rate while minimizing costs?" In that case, you create an objective function that specifies your goal and must choose dynamic controls that maximize that objective function. That is an optimal control problem, and although the problem can be formally composed fairly easily using variational calculus, the solution to the problem is often very difficult to obtain.

Chapter 2 of my dissertation describes the Park-Ramirez optimal control problem that will be used to illustrate all the concepts introduced throughout the rest of the dissertation. Basically, the system is a genetically-modified yeast used to create some desired protein. The yeast cells grow when there is a lot of glucose around, and create the desired foreign protein when there is not a lot of glucose around. The foreign protein is created internally, and is secreted when glucose concentration is high. The model describes the cell growth, glucose concentration, total foreign protein concentration, total secreted protein concentration, and reactor volume. The objective is to create as much foreign protein as possible by the end of some specified batch time (15 hours for all examples here). Sounds simple, right?

## Basic IDP

Chapter 3 of my dissertation describes the basic IDP algorithm. Very briefly, IDP is an algorithm that breaks the optimal control problem into a stagewise discrete decision problem. The algorithm starts at the last stage, trying some random values for the control and remembers the one that is best. It then moves back to the next to last stage and finds the best control for that stage. When it has worked backward to the first stage it has completed one iteration. Multiple iterations are required because of assumptions made to be able to set up the problem in this manner (see Chapter 3 for the specifics if you're interested).

Animation 1: Original IDP algorithm (every test control)

Animation 1 shows the original IDP algorithm in action solving the Park-Ramirez bioreactor optimal control problem with 10 stages, with an animation frame for each test value of control. Some things to notice about the animation:

- The X axis is time from 0 to 15 hours.
- The iteration counter is shown at the upper-left of the glucose feed plot.
- Two states (cells and glucose) are shown for fun.
- The "Performance Index" is the final numerical value of the objective function.
- The value in the upper-left of the Performance Index plot is the current fraction of the true best Performance Index (the decimal may be hard to see).
- After a few iterations, the dotted line in the glucose feed plot is the control range boundary. One of the important concepts for IDP is that the controls are chosen randomly within some range, and that range shrinks as iterations progress.
- Notice that IDP only affects the states during and after the current decision stage. Because of this, the global optimal is not guaranteed to be found with this method.
- Ten stages is not enough to closely match the true optimal feed rate seen in Chapter 2 of the dissertation, although the performance index value comes within about 3% of the optimal performance index.

### IDP with continuous controls

Animation 2: IDP with stagewise linear continuous controls (every test control)

Animation 2 shows IDP with stagewise linear continuous controls instead of stagewise constant controls. In most situations this is a better strategy to use. Although others have used IDP with stagewise continuous controls, my algorithm is an improvement over the existing algorithm as described in my dissertation.

### IDP with discontinuous controls

Animation 3: IDP with stagewise linear discontinuous controls (every test control)

Animation 3 shows IDP with stagewise linear discontinuous controls. This is sort of a cross between stagewise constant controls and stagewise linear continuous control: the control is linear within the stage but can be discontinuous at the stage boundary. This method had very poor performance for the Rark-Ramirez problem.

## IDP Efficiency

Chapter 4 of my dissertation discusses ways to improve the efficiency of IDP, since it is generally a very inefficient algorithm compared to other optimal control algorithms. The effect of IDP parameters on computation time is discussed, and a new control range contraction scheme is proposed.

## IDP Control Smoothing

One important problem with the IDP algorithm for some optimal control problems is that it tends to develop very noisy control solutions, especially if more than about 15-20 stages are used. Chapter 5 of my dissertation explores this problem.

Animation 4: Original IDP with 50 stagewise constant controls

Animation 4 demonstrates this problem for the Park-Ramirez optimal control problem with 50 stages. Even though the final performance index is within 0.3% of the optimal performance index, the controls are still far from the optimal.

### IDP with first-order control filter

Animation 5: IDP with first-order control filter

One method previously proposed to deal with this problem is to apply a first-order filter to the control policy after each IDP iteration. Animation 5 shows this technique. Note that although the results for this test are good with this technique, in general this technique was inferior to the rest of the techniques presented here.

### IDP with simulated annealing control filter

Animation 6: IDP with simulated annealing control filter

The simulated annealing filter shown in Animation 6 is also applied to the control strategy between iterations, but the filter is applied stochastically, and is more likely to be applied to stages where the performance index is harmed less. This filter takes more iterations to smooth, but was found to be superior to the first-order filter for the difficult identification problem discussed in Chapter 6 of the dissertation. Also, it was found to be less likely to find local minima than the first-order filter for this problem.

### IDP with control damping

Animation 7: IDP with control damping

Another way to decrease control activity is to punish it by adding a punishment term to the objective function. The discrete second derivative was added for this problem, and effect can be seen in Animation 7. Notice that controls far from surrounding controls are simply not chosen when this technique is used, as opposed to filtering techniques were they are chosen but then smoothed away between iterations.

### IDP with pivot point test controls

One observation made while studying the IDP algorithm was that the fact that it looks at only one stage at a time can trap the algorithm into noisy solutions. This effect is fully described in the dissertation in Chapter 5. To overcome this problem, "pivot point" test controls were added to the algorithm, where instead of testing random controls for a single stage, controls for two stages at a time are tested. These test controls mirror each other (if one is higher, the other is lower) which is why the technique is called pivot point test controls.

Animation 8: IDP with pivot point controls (every test control)

Animation 8 shows every test control when using pivot point test controls.

Animation 9: IDP with pivot point controls

Animation 9 shows another case using pivot point test controls with more stages. Pivot point test controls were found to obtain a smooth control faster than any other techniques tried, but they often found local minima as well. Because of this, a two-step method is described in my dissertation where the basic IDP algorithm is used for some number of iterations, then IDP with pivot point test controls is used to convergence. This gives much faster convergence without finding local minima.

## Hybid Dynamic Model Identification

The second major concept in my disseration is a new technique for identifying hybrid (empirical/mechanistic) dynamic models in Chapter 6. An example of such a model is the Park-Ramirez bioreactor model from the first part of the dissertation, but where the kinetic parameters (growth rate, protein production rate, protein secretion rate) are neural networks instead of the mechanistic versions used in the original model. Hybrid models are useful if parts of the model (such as kinetic terms) are too expensive to be found explicitly.

The identification problem is to find the model parameters (in this case the neural network weights) given a set of experimental data. Finding the weights directly is computationally prohibitive unless several limitations are placed on the model. I propose a new method with no model limitations. In this new method, the identification problem is split into two parts. The first part is formulated as an optimal control problem, where the neural network outputs are the controls, and the objective is to minimize the difference between model predicted states and state data. The second part is identification of the neural network weights using the neural network outputs from the first step and the system data.

To test this new identification technique, artificial noisy state data were generated using the mechanistic versions of the kinetic terms from the original Park-Ramirez model. The values of the kinetic parameters were then identified from this data using IDP as the optimal control algorithm.

Animation 10: Kinetic parameter identification using basic IDP

Animation 10 shows the first part of this two-part identification. There are several things to notice about this animation:

- The X axis is time from 0 to 15 hours.
- The top three plots are the kinetic parameter values (the neural network outputs).
- The dotted lines in the top three plots were the values of the kinetic parameters during artificial state data generation.
- The top number in the top left plot is the number of iterations.
- The bottom number in the top left plot is the sum squared error between artificial state data and model predicted states.
- The bottom three plots are three of the states.
- The dotted lines in the bottom three plots were the values of the states during artificial state data generation.
- The red pluses in the bottom three plots are the noisy artificial state data seen by the optimal control objective function.

In this case, IDP finds a growth rate profile that closely matches the growth rate used to generate the data, but the other two kinetic parameters are very noisy.

Animation 11: Kinetic parameter identification using two-step smoothed IDP (10 stages)

Animation 12: Kinetic parameter identification using two-step smoothed IDP (25 stages)

Animation 13: Kinetic parameter identification using two-step smoothed IDP (50 stages)

Animations 11-13 show the advantage of using the IDP smoothing techniques from the first part of the dissertation for this problem. Once the kinetic parameter values are found with respect to time, the neural networks are trained normally using backpropogation training (see Chapter 6). The neural networks identified using this technique were found to be very close to the original functions used to generate the data, indicating that the techique is viable.

The major advantage of this new identification technique relative to existing techniques are:

- Good results can still be obtained even if some states can be missing or have very few data.
- There is no restriction that the model be continuous. Although the Park-Ramirez model is a continuous model, this technique could be applied to models with state discontinuity or having discrete state values.