Sunday, 22 September 2013

depmixS4 version 1-3.0

Recently, we (Ingmar Visser and Maarten) released the the latest version of depmixS4 on CRAN. DepmixS4 is an R package to estimate mixture and hidden Markov models. This version includes several improvements (e.g., prettier model summaries, faster EM for large models) and a new (experimental!) feature to estimate models by maximizing the “classification likelihood”.

Classification likelihood

Estimating models with the classification likelihood can be achieved by setting classification="hard" in the emcontrol argument of the fit method. The emcontrol argument expects a list with several arguments and the best way to construct it is with the em.control function.

An example (from the em.control help page) is:

# load the package
library(depmixS4)
# load the speed example dataset
data(speed)
# specify a multivariate hidden Markov model with two states
mod <- depmix(list(rt ~ 1, corr ~ 1), data = speed, nstates = 2, family = list(gaussian(), 
    multinomial("identity")), ntimes = c(168, 134, 137))
# set the seed to get reproducible results
set.seed(1)
# fit the model by calling fit
fmod_CEM1 <- fit(mod, emcontrol = em.control(classification = "hard"))
iteration 0 logLik: -197.4 
converged at iteration 4 with logLik: -174.5 

Using the classification likelihood, each observation is assigned to a single state (the maximum a posteriori state). For hidden Markov models, the MAP state sequence is computed by the Viterbi algorithm. Using such “hard” assignment of observations to states tends to speed up convergence considerably. However, the classification likelihood is a different objective function than the usual likelihood and the resulting estimates may vary considerably. Initial tests using classification EM (CEM) for hidden Markov models indicate that the resulting estimates depend more strongly on starting-values than when using the regular EM algorithm.

By default, depmixS4 initially assigns observations randomly to states, which is a simple way to obtain random starting values. If we want to estimate with different starting values, we can simply run the fit method again (with a different seed):

set.seed(3)
fmod_CEM2 <- fit(mod, emcontrol = em.control(classification = "hard"))
iteration 0 logLik: -433.4 
converged at iteration 4 with logLik: -263.5 

Note the large difference in final likelihood values! Clearly, the starting values in the second run were not overly good, and estimation converged to a poor local maximum. As might be expected, there is also a large difference in the model estimates:

summary(fmod_CEM1)
Initial state probabilties model 
pr1 pr2 
  0   1 


Transition matrix 
          toS1    toS2
fromS1 0.90761 0.09239
fromS2 0.07937 0.92063

Response parameters 
Resp 1 : gaussian 
Resp 2 : multinomial 
    Re1.(Intercept) Re1.sd Re2.pr1 Re2.pr2
St1           5.527 0.2093 0.47059  0.5294
St2           6.397 0.2354 0.09524  0.9048
summary(fmod_CEM2)
Initial state probabilties model 
pr1 pr2 
  1   0 


Transition matrix 
         toS1   toS2
fromS1 0.7877 0.2123
fromS2 0.6126 0.3874

Response parameters 
Resp 1 : gaussian 
Resp 2 : multinomial 
    Re1.(Intercept) Re1.sd Re2.pr1 Re2.pr2
St1           6.137 0.4736       0       1
St2           5.705 0.3583       1       0

If we run the same example with the regular EM, we get the following results:

set.seed(1)
fmod_EM1 <- fit(mod, emcontrol = em.control(classification = "soft"))
iteration 0 logLik: -553.9 
iteration 5 logLik: -517.9 
iteration 10 logLik: -296.5 
iteration 15 logLik: -296.1 
iteration 20 logLik: -296.1 
converged at iteration 22 with logLik: -296.1 
set.seed(3)
fmod_EM2 <- fit(mod)  # soft classification is the default
iteration 0 logLik: -554.7 
iteration 5 logLik: -554.4 
iteration 10 logLik: -543.8 
iteration 15 logLik: -298.8 
iteration 20 logLik: -296.1 
iteration 25 logLik: -296.1 
converged at iteration 28 with logLik: -296.1 

Both runs converge to the same solution and the parameter estimates happen to be quite similar to the first solution found with the CEM algorithm:

summary(fmod_EM2)
Initial state probabilties model 
pr1 pr2 
  0   1 


Transition matrix 
          toS1   toS2
fromS1 0.89886 0.1011
fromS2 0.08359 0.9164

Response parameters 
Resp 1 : gaussian 
Resp 2 : multinomial 
    Re1.(Intercept) Re1.sd Re2.pr1 Re2.pr2
St1           5.521 0.2023 0.47207  0.5279
St2           6.392 0.2396 0.09851  0.9015

In conclusion, while it is always advisable to estimate a model repeatedly with different starting values, it seems especially relevant when using the classification likelihood.

The CEM algorithm is used more regularly in mixture modelling. For special cases of mixture models, it is equivalent to the k-means clustering algorithm. It's properties in hidden Markov modelling require further investigation.