Tikhnov regularised trust-region methods (*cough* Levenberg-Marquardt) oddly use two different approximations to the objective function at each step.
One regularised, one not.
What if we just regularised both?
1/
The first model function is minimised to produce the next iterate, this is the one with Tikhnov regularisation.
The second is used to predict how much the model was supposed to decrease, in order to adjust the step-size (trust-region radius.) This one is not regularised.
Tikhnov regularisation is often interpreted as interpolating between the Newton and gradient directions (plus some scaling.)
higher regularisation -> more gradient direction + smaller step-size
lower regularisation -> more Newton direction + larger steps
The typical heuristic to update the amount of regularisation each step is the Levenberg-Marquardt heuristic:
Take the ratio between the the actual decrease of a step and the decrease predicted by a local unregularised model.
If the ratio is close to 1, the model is good and we lower the regularisation next step. If it's closer to 0, the model is poor and we increase the regularisation.
The unregularised model is not the one used by the Tikhnov regularised model. This motives the "damped" ratio.
To see if it makes any difference to use the regularised model for step-size adjustment, I implemented a custom
`search` in Optimistix () to test it.
The GitHub gist is available here:
Because Optimistix is modular, by implementing this search we also implement a damped ratio
Levenberg-Marquardt and a new damped ratio BFGS trust-region method.
The latter would normally be very hard to implement without writing BFGS from scratch!
So how does it fair? Unsurprisingly, it behaves basically identically to the original trust-region update.
Here is the BFGS algorithm on some select minimisation test problems (averaged over a few different initialisations)
They mentioned the choice between these two model functions made little difference in practice.
While this is believable, and indeed proved to be true, it struck me as an example of a claim which is very difficult to verify using existing optimisation software.
12/12