Advanced Regression Analysis: How To Print All Best Models ?

In this post, I am about to explain you simple way to find as many best possible regression models you want, from any given predictors dataset.

I am going to show you a method, along with code, where you can print the summary statistics of all best models exported to a separate text file, get essential regression statistics and print the fitted values using lm() and rlm() – robust regression along with deviations and plots.

You also have the option to choose your best models based on number of variables in each model and multiple selections parameters such as adj-Rsq and Mallows-Cp. All in one piece of code.

We have had our shots with regression analysis. Though there is nothing as exciting as the moments when you lay your hands over that freshly prepared data, it could get frustrating when you need to get it delivered regularly in a time sensitive
manner. The codes I show in this post should help alleviate the issues caused by the routineness of the regression modelling process. In other words its for Stats analysts with routine deadlines.

Best subsets regression with leaps

Best subsets regression with leaps

I have seen people coming from other platforms where they typically use a software-inbuilt procedure to run a forecast or regression model or just use mouse clicks in a GUI interface to make their models. Doing these in a procedural manner
causes routineness and boredom subsequently when you have to get the results out repeatedly.

It would be a grave mistake if R programmers take the same route and repeat the mistakes committed by GUI analysts and Procedural statisticians. There is just too much amateurish R code out there that they underestimate the potential of R as
a programming language – often making comparisons with other statistical softwares. This view later becomes a benchmark for the newcomers to the language, who tend to learn it in parts and end up having an incomplete idea of the potential
of this language, a fate JavaScript had suffered for a while now. Who are we after all if we don’t use the excellent algorithmic capabilities that R generously offers. So remember, R is not just a statistical software, its a good programming language

Now, coming back to the discussion. Lets load the ‘leaps’, ‘car’ and ‘MASS’ packages. The steps I am writing below should not be considered as a holy grail mechanism, but rather, you should have done the prior variable reduction part before you feed in the selected variables to the procedure below.

This script will generate the following outputs in the working directory:

Continued in next page..


Is it wise to install ALL the packages in CRAN ?

First off, here is how you can install all R Packages in one go:

packs <- installed.packages() # Get the currently installed packages

exc <- names(packs[,’Package’])  # Get the names in a vector

av <- names(available.packages()[,1]) #Get names of available packages in Cran

ins <- av[!av %in% exc] #Make a list of all packages that you havent installed

install.packages(ins) # Install the desired packages

It could takes a couple of hours based on your processor speed to complete the entire operation. But is it worth it? Lets take a closer look.

Coming to topic, the immediate assumption is that it will to slow down the computing performance and the processig speed somehow gets affected. Well, thats not how I’ve seen it work. What happens in reality is, it just consumes so much of space in your hard drive as the size of the packages. Under normal circumstances that is not a big ask. R Packages usually occupy a few MBs of space that is well justified for the value it brings to table.

But on the other hand, if you are an R enthusiast who constantly explore new packages or work on multiple projects and solve problems, it could save you time and frustration to load your package right away and start using the functions than to install new packages and dependencies every other time. If you think about it, What makes R what it is today is the rich collection of packages and structured documentation that go along. The ability to exploit the available resources can be a potent weapon to any problem solving that we may face.

I use a PC with 8GB ram and about 1 TB HD space.

The above is generally my experience so far. If you have other opinions please feel free to leave a comment.

Author: Selva Prabhakaran Sanjeevi Julian

Selva Prabhakaran