You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This method should compute and store things you want to make available to the loss functions, and returns `nothing`. For example, as we have seen in [Second example - maximum likelihood](@ref), the `RAM` implied type computes the model-implied covariance matrix and makes it available via `Σ(implied)`.
25
-
To make stored computations available to loss functions, simply write a function - for example, for the `RAM` implied type we defined
20
+
ifis_objective_required(targets)
21
+
...
22
+
end
26
23
27
-
```julia
28
-
Σ(implied::RAM) = implied.Σ
24
+
ifis_gradient_required(targets)
25
+
...
26
+
end
27
+
ifis_hessian_required(targets)
28
+
...
29
+
end
30
+
31
+
end
29
32
```
30
33
31
-
Additionally, you can specify methods for `gradient` and `hessian` as well as the combinations described in [Custom loss functions](@ref).
34
+
As you can see, `update` gets passed as a first argument `targets`, which is telling us whether the objective value, gradient, and/or hessian are needed.
35
+
We can then use the functions `is_..._required` and conditional on what the optimizer needs, we can compute and store things we want to make available to the loss functions. For example, as we have seen in [Second example - maximum likelihood](@ref), the `RAM` implied type computes the model-implied covariance matrix and makes it available via `implied.Σ`.
32
36
33
-
The last thing nedded to make it work is a method for `nparams` that takes your implied type and returns the number of parameters of the model:
34
37
35
-
```julia
36
-
nparams(implied::MyImplied) =...
37
-
```
38
38
39
39
Just as described in [Custom loss functions](@ref), you may define a constructor. Typically, this will depend on the `specification = ...` argument that can be a `ParameterTable` or a `RAMMatrices` object.
40
40
41
41
We implement an `ImpliedEmpty` type in our package that does nothing but serving as an `implied` field in case you are using a loss function that does not need any implied type at all. You may use it as a template for defining your own implied type, as it also shows how to handle the specification objects:
As you see, similar to [Custom loss functions](@ref) we implement a method for `update_observed`. Additionally, you should store the `identifier` from the specification object and write a method for `identifier`, as this will make it possible to access parameter indices by label.
98
+
As you see, similar to [Custom loss functions](@ref) we implement a method for `update_observed`.
The function `evaluate!` recognizes by the types of the arguments `objective`, `gradient` and `hessian` whether it should compute the objective value, gradient or hessian of the model w.r.t. the parameters.
33
+
In this case, `gradient` and `hessian` are of type `Nothing`, signifying that they should not be computed, but only the objective value.
34
+
31
35
That's all we need to make it work! For example, we can now fit [A first model](@ref) with ridge regularization:
32
36
33
37
We first give some parameters labels to be able to identify them as targets for the regularization:
@@ -86,15 +91,23 @@ Note that the last argument to the `objective!` method is the whole model. There
86
91
By far the biggest improvements in performance will result from specifying analytical gradients. We can do this for our example:
87
92
88
93
```@example loss
89
-
import StructuralEquationModels: gradient!
90
-
91
-
function gradient!(ridge::Ridge, par, model::AbstractSemSingle)
92
-
gradient = zero(par)
93
-
gradient[ridge.I] .= 2*ridge.α*par[ridge.I]
94
-
return gradient
94
+
function evaluate!(objective, gradient, hessian::Nothing, ridge::Ridge, model::AbstractSem, par)
95
+
# compute gradient
96
+
if !isnothing(gradient)
97
+
fill!(gradient, 0)
98
+
gradient[ridge.I] .= 2 * ridge.α * par[ridge.I]
99
+
end
100
+
# compute objective
101
+
if !isnothing(objective)
102
+
return ridge.α * sum(i -> par[i]^2, ridge.I)
103
+
end
95
104
end
96
105
```
97
106
107
+
As you can see, in this method definition, both `objective` and `gradient` can be different from `nothing`.
108
+
We then check whether to compute the objective value and/or the gradient with `isnothing(objective)`/`isnothing(gradient)`.
109
+
This syntax makes it possible to compute objective value and gradient at the same time, which is beneficial when the the objective and gradient share common computations.
110
+
98
111
Now, instead of specifying a `SemFiniteDiff`, we can use the normal `Sem` constructor:
99
112
100
113
```@example loss
@@ -119,46 +132,7 @@ using BenchmarkTools
119
132
120
133
The exact results of those benchmarks are of course highly depended an your system (processor, RAM, etc.), but you should see that the median computation time with analytical gradients drops to about 5% of the computation without analytical gradients.
121
134
122
-
Additionally, you may provide analytic hessians by writing a method of the form
however, this will only matter if you use an optimization algorithm that makes use of the hessians. Our default algorithmn `LBFGS` from the package `Optim.jl` does not use hessians (for example, the `Newton` algorithmn from the same package does).
132
-
133
-
To improve performance even more, you can write a method of the form
This is beneficial when the computation of the objective and gradient share common computations. For example, in maximum likelihood estimation, the model implied covariance matrix has to be inverted to both compute the objective and gradient. Whenever the optimization algorithmn asks for the objective value and gradient at the same point, we call `objective_gradient!` and only have to do the shared computations - in this case the matrix inversion - once.
143
-
144
-
If you want to do hessian-based optimization, there are also the following methods:
Additionally, you may provide analytic hessians by writing a respective method for `evaluate!`. However, this will only matter if you use an optimization algorithm that makes use of the hessians. Our default algorithmn `LBFGS` from the package `Optim.jl` does not use hessians (for example, the `Newton` algorithmn from the same package does).
162
136
163
137
## Convenient
164
138
@@ -241,11 +215,11 @@ With this information, we write can implement maximum likelihood optimization as
0 commit comments