Comparing Holt Winters Implementations in R – Part 1

Quote

This a multipart series aiming to compare and contrast the various Holt Winters implementations in R. We intend to focus more on the practical and applied aspects of the implementations to get a better grip over the behaviour of models and predictions. So to begin with lets look at the ‘HoltWinters()’ function in stats package that comes built-in with R and the ‘hw()’ function in forecast package by Dr. Rob J Hyndman. To install the forecast package, use the following command:

install.packages("forecast")

For this analysis, I have chosen to use a time series data available in rdatamarket. The data used is Monthly total number of pigs slaughtered in Victoria. Jan 1980 – August 1995. Since this is available in rdatamarket, we can directly load it in R using the rdatamarket package. So lets load the package and download the data:

library(rdatamarket)
library(forecast)
 
pigs <- dmseries("http://data.is/H63F9L")

This is how the time series appears. On visual inspection, there clearly is a cyclicity and some trend elements involved. As you can see from the pattern, one can conclude that it follows an additive pattern on both seasonal and trend components.

Monthly Total Number of Pigs Slaughtered in Victoria

Monthly Total Number of Pigs Slaughtered in Victoria

Our approach here is to compare the accuracy prediction capabilities of these functions rather than the closeness of the fit on the training data. Though I have limited my analysis to one type of time series for now, we may have do a similar analysis on other characteristics too, but that will have to wait for some time.

The accuracy measure I have chosen for comparison purposes in this analysis is Mean Absolute Percentage Deviation (MAPE). Our aim here is not to conclude that one function predicts better than the other, but to come to a experiential understanding over how the functions behave depending on the nature of the timeseries. I also would like to point out that the selection of this data for this analysis is purely arbitrary and I encourage you to plug in this code to other time series data you may come across. We have loaded the data and downloaded the packages, So lets begin..

Advanced Regression Analysis: How To Print All Best Models ?

In this post, I am about to explain you simple way to find as many best possible regression models you want, from any given predictors dataset.

I am going to show you a method, along with code, where you can print the summary statistics of all best models exported to a separate text file, get essential regression statistics and print the fitted values using lm() and rlm() – robust regression along with deviations and plots.

You also have the option to choose your best models based on number of variables in each model and multiple selections parameters such as adj-Rsq and Mallows-Cp. All in one piece of code.

We have had our shots with regression analysis. Though there is nothing as exciting as the moments when you lay your hands over that freshly prepared data, it could get frustrating when you need to get it delivered regularly in a time sensitive
manner. The codes I show in this post should help alleviate the issues caused by the routineness of the regression modelling process. In other words its for Stats analysts with routine deadlines.

Best subsets regression with leaps

Best subsets regression with leaps

I have seen people coming from other platforms where they typically use a software-inbuilt procedure to run a forecast or regression model or just use mouse clicks in a GUI interface to make their models. Doing these in a procedural manner
causes routineness and boredom subsequently when you have to get the results out repeatedly.

It would be a grave mistake if R programmers take the same route and repeat the mistakes committed by GUI analysts and Procedural statisticians. There is just too much amateurish R code out there that they underestimate the potential of R as
a programming language – often making comparisons with other statistical softwares. This view later becomes a benchmark for the newcomers to the language, who tend to learn it in parts and end up having an incomplete idea of the potential
of this language, a fate JavaScript had suffered for a while now. Who are we after all if we don’t use the excellent algorithmic capabilities that R generously offers. So remember, R is not just a statistical software, its a good programming language
too.

Now, coming back to the discussion. Lets load the ‘leaps’, ‘car’ and ‘MASS’ packages. The steps I am writing below should not be considered as a holy grail mechanism, but rather, you should have done the prior variable reduction part before you feed in the selected variables to the procedure below.

This script will generate the following outputs in the working directory:

Continued in next page..

Is it wise to install ALL the packages in CRAN ?

First off, here is how you can install all R Packages in one go:

packs <- installed.packages() # Get the currently installed packages

exc <- names(packs[,’Package’])  # Get the names in a vector

av <- names(available.packages()[,1]) #Get names of available packages in Cran

ins <- av[!av %in% exc] #Make a list of all packages that you havent installed

install.packages(ins) # Install the desired packages

It could takes a couple of hours based on your processor speed to complete the entire operation. But is it worth it? Lets take a closer look.

Coming to topic, the immediate assumption is that it will to slow down the computing performance and the processig speed somehow gets affected. Well, thats not how I’ve seen it work. What happens in reality is, it just consumes so much of space in your hard drive as the size of the packages. Under normal circumstances that is not a big ask. R Packages usually occupy a few MBs of space that is well justified for the value it brings to table.

But on the other hand, if you are an R enthusiast who constantly explore new packages or work on multiple projects and solve problems, it could save you time and frustration to load your package right away and start using the functions than to install new packages and dependencies every other time. If you think about it, What makes R what it is today is the rich collection of packages and structured documentation that go along. The ability to exploit the available resources can be a potent weapon to any problem solving that we may face.

I use a PC with 8GB ram and about 1 TB HD space.

The above is generally my experience so far. If you have other opinions please feel free to leave a comment.

Author: Selva Prabhakaran Sanjeevi Julian

Selva Prabhakaran