Skip to content
Snippets Groups Projects
step_optim.Rd 7.72 KiB
Newer Older
  • Learn to ignore specific revisions
  • % Generated by roxygen2: do not edit by hand
    % Please edit documentation in R/step_optim.R
    \name{step_optim}
    \alias{step_optim}
    \title{Forward and backward model selection}
    \usage{
    step_optim(
      modelfull,
      data,
      prm = list(NA),
      direction = c("both", "backward", "forward", "backwardboth", "forwardboth"),
      modelstart = NA,
      keepinputs = FALSE,
      optimfun = rls_optim,
      fitfun = NA,
      scorefun = rmse,
      printout = FALSE,
      mc.cores = getOption("mc.cores", 2L),
      ...
    )
    }
    \arguments{
    \item{modelfull}{The full forecastmodel containing all inputs which will be
    can be included in the selection.}
    
    \item{data}{The data.list which holds the data on which the model is fitted.}
    
    \item{prm}{A list of integer parameters to be stepped. Given using the same
    syntax as parameters for optimization, e.g. `list(I__degree = c(min=3,
    max=7))` will step the "degree" for input "I".}
    
    \item{direction}{The direction to be used in the selection process.}
    
    \item{modelstart}{A forecastmodel. If it's set then it will be used as the
    selected model from the first step of the stepping. It should be a sub
    model of the full model.}
    
    \item{keepinputs}{If TRUE no inputs can be removed in a step, if FALSE then
    any input can be removed. If given as a character vector with names of
    inputs, then they cannot be removed in any step.}
    
    \item{optimfun}{The function which will carry out the optimization in each
    step.}
    
    \item{fitfun}{A fit function, should be the same as used in optimfun(). If
    provided, then the score is caculated with this function (instead of the
    one called in optimfun(), hence the default is rls_fit(), which is called
    in rls_optim()). Furthermore, information on complete cases are printed
    and returned.}
    
    \item{scorefun}{The score function used.}
    
    
    \item{printout}{Logical. Passed on to fitting functions.}
    
    
    \item{mc.cores}{The mc.cores argument of mclapply. If debugging it can be
    
    pbac's avatar
    pbac committed
    necessary to set it to 1 to stop execution.}
    
    
    \item{...}{Additional arguments which will be passed on to optimfun. For
    example control how many steps}
    }
    \value{
    A list with the result of each step:
     - '$model' is the model selected in each step
     - '$score' is the score for the model selected in each step
     - '$optimresult' the result return by the the optimfun
     - '$completecases' a logical vector (NA if fitfun argument is not given) indicating which time points were complete across all horizons and models for the particular step.
    }
    \description{
    Forward and backward model selection
    }
    \details{
    This function takes a model and carry out a model selection by stepping
    backward, forward or in both directions.
    
    Note that mclapply is used. In order to control the number of cores to use,
    then set it, e.g. to one core `options(mc.cores=1)`, which is needed for
    debugging to work.
    
    The full model containing all inputs must be given. In each step new models
    are generated, with either one removed input or one added input, and then all
    the generated models are optimized and their scores compared. If any new
    model have an improved score compared to the currently selected model, then
    the new is selected and the process is repeated until no new improvement is
    obtained.
    
    In addition to selecting inputs, then integer parameters can also be stepped
    through, e.g. the order of basis splined or the number of harmonics in
    Fourier series.
    
    The stepping process is different depending on the direction. In addition to
    the full model, a starting model can be given, then the selection process
    will start from that model.
    
    If the direction is "both", which is default (same as "backwardboth") then the
    stepping is:
     - In first step inputs are removed one-by-one
     - In following steps, inputs still in the model are removed one-by-one, and
       inputs not in the model are added one-by-one
    
    If the direction is "backwards":
     - Inputs are only removed in each step
    
    If the direction is "forwardboth":
     - In the first step all inputs are removed
     - In following steps (same as "both")
    
    If the direction is "forward":
    - In the first step all inputs are removed and from there inputs are only added
     
    
    For stepping through integer variables in the transformation stage, then
    these have to be set in the "prm" argument. The stepping process will follow
    the input selection described above.
    
    In case of missing values, especially in combination with auto-regressive
    models, it can be very important to make sure that only complete cases are
    included when calculating the score. By providing the `fitfun` argument then
    the score will be calculated using only the complete cases across horizons
    and models in each step, see the last examples.
    
    
    Note, that either kseq or kseqopt must be set on the modelfull object. If kseqopt
    is set, then it is used no matter the value of kseq.
    
    }
    \examples{
    
    
    # The data, just a rather short period to keep running times short
    D <- subset(Dbuilding, c("2010-12-15", "2011-02-01"))
    # Set the score period
    D$scoreperiod <- in_range("2010-12-22", D$t)
    #
    D$tday <- make_tday(D$t, 1:36)
    # Generate an input which is just random noise, i.e. should be removed in the selection
    set.seed(83792)
    D$noise <- make_input(rnorm(length(D$t)), 1:36)
    
    # The full model
    model <- forecastmodel$new()
    # Set the model output
    model$output = "heatload"
    # Inputs (transformation step)
    model$add_inputs(Ta = "Ta",
                     noise = "noise",
                     mu_tday = "fs(tday/24, nharmonics=5)",
                     mu = "one()")
    # Regression step parameters
    model$add_regprm("rls_prm(lambda=0.9)")
    # Optimization bounds for parameters
    model$add_prmbounds(lambda = c(0.9, 0.99, 0.9999))
    
    # Select a model, in the optimization just run it for a single horizon
    
    # Note that kseqopt could also be set
    model$kseq <- 5
    
    # 
    prm <- list(mu_tday__nharmonics = c(min=3, max=7))
    
    # Note the control argument, which is passed to optim, it's now set to few
    # Iterations in the prm optimization (MUST be increased in real applications)
    control <- list(maxit=1)
    
    
    pbac's avatar
    pbac committed
    # On Windows multi cores are not supported, so for the examples use only one
    mc.cores <- 1
    
    
    # Run the default selection scheme, which is "both"
    # (same as "backwardboth" if no start model is given)
    
    pbac's avatar
    pbac committed
    L <- step_optim(model, D, prm, control=control, mc.cores=mc.cores)
    
    
    # The optim value from each step is returned
    getse(L, "optimresult")
    getse(L,"score")
    
    # The final model
    L$final$model
    
    # Other selection schemes
    
    pbac's avatar
    pbac committed
    Lforward <- step_optim(model, D, prm, "forward", control=control, mc.cores=mc.cores)
    Lbackward <- step_optim(model, D, prm, "backward", control=control, mc.cores=mc.cores)
    Lbackwardboth <- step_optim(model, D, prm, "backwardboth", control=control, mc.cores=mc.cores)
    Lforwardboth <- step_optim(model, D, prm, "forwardboth", control=control, mc.cores=mc.cores)
    
    
    # It's possible avoid removing specified inputs
    
    pbac's avatar
    pbac committed
    L <- step_optim(model, D, prm, keepinputs=c("mu","mu_tday"), control=control, mc.cores=mc.cores)
    
    
    # Give a starting model
    modelstart <- model$clone_deep()
    modelstart$inputs[2:3] <- NULL
    
    pbac's avatar
    pbac committed
    L <- step_optim(model, D, prm, modelstart=modelstart, control=control, mc.cores=mc.cores)
    
    
    # If a fitting function is given, then it will be used for calculating the forecasts
    # ONLY on the complete cases in each step
    
    pbac's avatar
    pbac committed
    L1 <- step_optim(model, D, prm, fitfun=rls_fit, control=control, mc.cores=mc.cores)
    
    
    # The easiest way to conclude if missing values have an influence is to
    # compare the selection result running with and without
    
    pbac's avatar
    pbac committed
    L2 <- step_optim(model, D, prm, control=control, mc.cores=mc.cores)
    
    
    # Compare the selected models
    tmp1 <- capture.output(getse(L1, "model"))
    tmp2 <- capture.output(getse(L2, "model"))
    identical(tmp1, tmp2)
    
    
    # Note that caching can be really smart (the cache files are located in the
    # cachedir folder (folder in current working directory, can be removed with
    # unlink(foldername)) See e.g. `?rls_optim` for how the caching works
    
    pbac's avatar
    pbac committed
    # L <- step_optim(model, D, prm, "forward", cachedir="cache", cachererun=FALSE, mc.cores=mc.cores)