** Output Models.txt ** – Contains the

**. This is the main output file because it prints out 10 best models for models of all possible sizes, meaning, the first 10 models will be 1 variable models, the second 10 will have 2 predictor variables, the 3rd 10 models will have 3 variables and so on. Within each group these models are arranged based on highest ‘adj Rsq’ values. Quite a convenience eh?**

*summary, vif, model fit, forecast of holdout observations and deviations of lm() and rlm()** Actual vs Fit/Forecast Charts *– The Actual vs Fit charts for all the best models are stored in PNG format.

Should you wish to make changes and incorporate this in your work, you will have to change the read.table and colnames(dat) statements to suit your needs, and do quote this blog ‘rprogrammingblog.wordpress.com’ at the top in your source code. I would like to know how it works out for you, in case you tried it out.

**How to use this code?**

Step1: Install the “leaps”, “car”, “MASS” libraries using *install.packages(c(“leaps”,”car”,”MASS”) *command.

Step 2: If you have your own response-predictors data, update dat <- read.table() and colnames(dat) sections in below code to point to the correct location in your local disk. If your dataset has headers, comment out the colnames(dat) part. Finally, the first column in your dataset is assumed as a time column like month-year. If your analysis does not have time series data like demand or sales, fill up the first column in your data as row indices. You are now ready to run the code.

If you do not have your own dataset but want to just see how this code works, just install the packages as explained above and run the rest of it as it is in your R console.

library(leaps) library(car) library(MASS) cat(paste("Outputs will be collected in this location :",getwd())) store.graphs = TRUE holdout.obs = 6 # No of observations you want to holdout in the tail of input data. These values will be predicted using lm() and rlm(). # Change the following 2 statements as per your need. dat <- read.table("http://www.stat.ufl.edu/~winner/data/winepop.dat",header=T) #Load the input data. The last few observations will be in hold-out for prediction, as directed by 'holdout.obs' variable. colnames(dat) <- c("Year", "Total.Population.Thousands", "years.5", "years.5to14","years.15to24","years.25to34","years.35to44","years.45to54","years.55to64","sixtyfiveyears","Wine.Consumption.Millions.of.gallons") predictors.df <- dat[,c(3:ncol(dat))] # The predictor variables in your input dataset (dat) target.df <- dat[,2] # Point to the response variable in 'dat' LPS <- leaps(x = predictors.df ,y = target.df, names = colnames(predictors.df), nbest = 10 ,method = "adjr2") sink("Output Models.txt") for(i in (1:nrow(LPS$which))){ # Create a formula using those variables preds <- paste("I(",names(which(LPS$which[i,]!= "FALSE")),"^",1,")",sep="", collapse = " + ") fm <- paste("(",colnames(dat)[2]," ~ ",preds,")",sep = "") # Create a linear model (LM) and a robust linear model (RLM) rg <- lm(as.formula(fm), data= as.data.frame(dat[c(1:(nrow(dat)-holdout.obs)),])) rrg <- rlm(as.formula(fm), data= as.data.frame(dat[c(1:(nrow(dat)-holdout.obs)),])) # Store the Actual and Predicted out <- data.frame(actuals = dat[,2] ,predicted.robust = round(predict(rrg, dat)), predicted.lm = round(predict(rg, dat))) # Get the percent deviations of the linear model and robust linear model robdev <- as.numeric(as.matrix(out[1:nrow(dat),2])) - as.numeric(as.matrix(out[1:nrow(dat),1])) lmdev <- as.numeric(as.matrix(out[1:nrow(dat),3]))- as.numeric(as.matrix(out[1:nrow(dat),1])) deviation <- cbind(robdev, lmdev) devperc.rob <- sprintf("%1.2f%%" , deviation[,1]/as.numeric(as.matrix(out[1:nrow(dat),1])) * 100) devperc.lm <- sprintf("%1.2f%%" , deviation[,2]/as.numeric(as.matrix(out[1:nrow(dat),1])) * 100) # Add the deviations to the output devperc <- cbind(rob.lm.dev = c(devperc.rob, rep("-", (nrow(out) - length(devperc.rob)))), lm.dev = c (devperc.lm, rep("-", (nrow(out) - length (devperc.lm))))) out <- cbind(dat$Year, out, devperc) # If we chose to store the actual vs. predicted graphs, generate the graphs and store them. The graphs will be stored in the working directory. if(store.graphs){ # Set the margins of the graph par(mar = c(4,4,4,1)) # Set up output to PNG. png(file=paste("Model ",i,".png",sep=""),width=900,height=550) # Plot the actual vs predicted and set parameters of the graph pic <- plot(c(1:nrow(dat)), out[,2], type = "b", cex.axis = 0.5, lwd = 2, xlab = "Year-Month", ylab = "Demand", sub = paste("Plot from: ",dat[1,1] ," to ",dat [nrow(dat),1] ,sep = ""), main = paste("Model ",i,", No. Forecasts: ", (nrow (dat)-nrow(dat)),sep="")) text(12,((max(out[,c(2:4)])-min(out[,c(2:4)]))/2)+10,print(paste(names(which(LPS$which[i,]!= "FALSE")),sep="\n", collapse = " ")), adj = 1) box("figure") lines(out[,3], col = "red", lwd = 2) lines(out[,4], col = "blue", lwd = 2) legend("bottomright", inset=.01, c("Actual","Robust", "linear"), fill=c(1,4,2), horiz=TRUE, cex=0.75 ) # End output to PNG dev.off() } # PRINT THE OUTPUT cat("Model",i,"\n") print(summary(rg)) cat("vif") if(length(rg$coefficients)>2){print(vif(rg))} print(out) cat("------------------------------------------------------------------- \n") } # close the output file sink() Author: Selva Prabhakaran Sanjeevi Julian