--- title: "Online updating of onlineforecast models" author: "Peder Bacher" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_debth: 3 vignette: > %\VignetteIndexEntry{Online updating of onlineforecast models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r external-code, cache=FALSE, include=FALSE, purl = FALSE} # Have to load the knitr to use hooks library(knitr) # This vignettes name vignettename <- "online-updating" # REMEMBER: IF CHANGING IN THE shared-init (next block), then copy to the others! ``` <!--shared-init-start--> ```{r init, cache=FALSE, include=FALSE, purl=FALSE} # Width will scale all figwidth <- 12 # Scale the wide figures (100% out.width) figheight <- 4 # Heights for stacked time series plots figheight1 <- 5 figheight2 <- 6.5 figheight3 <- 8 figheight4 <- 9.5 figheight5 <- 11 # Set the size of squared figures (same height as full: figheight/figwidth) owsval <- 0.35 ows <- paste0(owsval*100,"%") ows2 <- paste0(2*owsval*100,"%") # fhs <- figwidth * owsval # Set for square fig: fig.width=fhs, fig.height=fhs, out.width=ows} # If two squared the: fig.width=2*fhs, fig.height=fhs, out.width=ows2 # Check this: https://bookdown.org/yihui/rmarkdown-cookbook/chunk-styling.html # Set the knitr options knitr::opts_chunk$set( collapse = TRUE, comment = "##!! ", prompt = FALSE, cache = TRUE, cache.path = paste0("../tmp/vignettes/tmp-",vignettename,"/"), fig.align="center", fig.path = paste0("../tmp/vignettes/tmp-",vignettename,"/"), fig.height = figheight, fig.width = figwidth, out.width = "100%" ) options(digits=3) # For cropping output and messages cropfun <- function(x, options, func){ lines <- options$output.lines ## if (is.null(lines)) { ## return(func(x, options)) # pass to default hook ## } if(!is.null(lines)){ x <- unlist(strsplit(x, "\n")) i <- grep("##!!",x) if(length(i) > lines){ # truncate the output, but add .... x <- x[-i[(lines+1):length(i)]] x[i[lines]] <- pst(x[i[lines]], "\n\n## ...output cropped") } # paste these lines together x <- paste(c(x, ""), collapse = "\n") } x <- gsub("!!","",x) func(x, options) } hook_chunk <- knit_hooks$get("chunk") knit_hooks$set(chunk = function(x, options) { cropfun(x, options, hook_chunk) }) ``` [onlineforecasting]: https://arxiv.org/abs/2109.12915 [building heat load forecasting]: https://onlineforecasting.org/examples/building-heat-load-forecasting.html [onlineforecasting.org]: https://onlineforecasting.org <!--shared-init-end--> ## Intro This vignette explains how to use the package in a real online operation, where the following is repeated in real time: new data is received, model is updated and forecasts are calculated. At a lower frequency the parameters are optimized, e.g. the updating is executed every hour and the parameters are optimized once a week. Load the package: ```{r, echo=1:2, purl=1:2} # Load the package library(onlineforecast) #library(devtools) #load_all(as.package("../../onlineforecast")) ``` Load data, setup and define a model: ```{r, output.lines=10} # Keep the data in D to simplify notation D <- Dbuilding # Set the score period D$scoreperiod <- in_range("2010-12-20", D$t) # Set the training period D$trainperiod <- in_range(D$t[1], D$t, "2011-02-01") # Define a new model with low-pass filtering of the Ta input model <- forecastmodel$new() model$output = "heatload" model$add_inputs(Ta = "lp(Ta, a1=0.9)", mu = "one()") model$add_regprm("rls_prm(lambda=0.9)") model$add_prmbounds(Ta__a1 = c(0.5, 0.9, 0.9999), lambda = c(0.9, 0.99, 0.9999)) model$kseq <- 1:36 # Optimize the parameters on the train period (i.e. until 2011-02-01) rls_optim(model, subset(D,D$trainperiod), kseq = c(3,18)) ``` ## Recursive update and prediction Here we go through the steps of getting new data, running a model update and making predictions. First fit on a period: ```{r} # Keep a sequence with these points iseq <- which(in_range("2010-12-15",D$t,"2011-01-01")) # Fit the model rls_fit(model$prm, model, subset(D, iseq)) ``` Now the fits are saved in the model object (its an R6 object, hence passed by reference to the functions and can be changed inside the functions). A list of fits with an entry for each horizon is in Lfits, see the two first: ```{r} # The data of a fit is saved in this list str(model$Lfits[1:2]) ``` We step to the next time step, where new data arrives. Take the point right after the fit period and take the data for this time point: ```{r} # Next time step (i <- iseq[length(iseq)] + 1) # The new data for this time step Dnew <- subset(D, i) ``` To update and predict, we need to transform the new data (this must only be done only once for each new data, since some transform functions, e.g. lp(), actually keep state data, see some on this in \texttt{?lp} and \texttt{?forecastmodel} under \texttt{\$reset\_state()}): ```{r} # Transform the new data DnewTransformed <- model$transform_data(Dnew) ``` Then we can update the regression coefficients using the transformed data ```{r} # The value of the coefficients for horizon 1, before the update model$Lfits$k1$theta # Update the coefficients rls_update(model, DnewTransformed, Dnew[[model$output]]) # The value of the coefficients for horizon 1, after the update model$Lfits$k1$theta ``` Calculate predictions using the new data and the updated fits: ```{r} # Calculate the predictions yhat <- rls_predict(model, DnewTransformed) ``` Plot to see the predictions with the observations: ```{r} # The index for the predicted steps ahead iseq <- i+model$kseq # Plot the observations and predictions plot(D$t[iseq], D$heatload[iseq], type = "b", xlab = "t", ylab = "y") lines(D$t[iseq], yhat, type = "b", col = 2) legend("topright", c("Observations",pst("Predictions (",min(model$kseq)," to ",max(model$kseq)," steps ahead)")), lty = 1, col = 1:2) ``` Run this for a longer period to verify that the same forecasts are obtained (in one go vs. iteratively) First in one go on all data: ```{r} # Fit and predict on entire data val <- rls_fit(model$prm, model, D) # Keep the forecasts D$Yhat1 <- val$Yhat ``` and then run iteratively through: ```{r} # The indexes of training period itrain <- which(in_range("2010-12-15",D$t,"2011-01-01")) # The indexes of the test period itest <- which(in_range("2011-01-01",D$t,"2011-01-04")) # Fit on the training period rls_fit(model$prm, model, subset(D, itrain)) # Prepare for the forecasts D$Yhat2 <- data.frame(matrix(NA, nrow(D$Yhat1), ncol(D$Yhat1))) names(D$Yhat2) <- names(D$Yhat1) # Step through the test period: for(i in itest){ Dnew <- subset(D, i) Dnewtr <- model$transform_data(Dnew) rls_update(model, Dnewtr, Dnew[[model$output]]) D$Yhat2[i, ] <- as.numeric(rls_predict(model, Dnewtr)) } ``` Compare to see the difference between the one step forecasts: ```{r} D$Yhat1$k1[itest] - D$Yhat2$k1[itest] ```