Analysis of Quantile Regression as Alternative to Ordinary Least Squares Regression

Analysis of Quantile Regression as Alternative to Ordinary Least Squares Regression

Chapter One

Aim and Objective(s) of the Study

The main aim of the study is to investigate quantile regression as an alternative to least squares regression, especially when the number of regressors increases.

To examine the quantile regression and least squares
To compare the models in term of goodness of fit
To recommend a suitable model for regression

CHAPTER TWO

LITERATURE REVIEW

Introduction

Elementary statistics texts tell us that the method of least squares was first discovered about 1805 (Stigler, 1986). There has been a dispute about who first discovered the method of least squares. It appears that it was discovered independently by Carl Friedrich Gauss (1777-1855) and Adrien Marie Legendre (1752-1833), that Gauss started using it before 1803 (he claimed in about 1795, but there is no corroboration of this earlier date, and that the first account was published by Legendre in 1805, see(Draper & Smith, 1981). Stigler (1986) notes that Sir Francis Galton discovered regression about 1885 in studies of heredity. Any contemporary course in regression analysis today starts with the methods of least squares and its variations.

Multiple Linear Regression

Multiple linear regression (MLR) is one of the most commonly used data mining techniques, and can provide insightful information in cases where the rigid assumptions associated with MLR are met. The assumptions include:

linearity of the coefficients;
Normal or Gaussian distribution for the response errors (ε); and
The errors (ε)have a common distribution.
Equal variance (homoscedasticity).

MLR is a very versatile tool and can be applied to almost any process, system, or area of study. Much has been published regarding this subject, and the following text may be useful to the reader: Kutneret al (2004) as well as (Myers, 1990), provide thorough accounts of MLR and will be indispensable for most readers.

A key step in developing an appropriate MLR model is selecting a method of model building and a set of best model criteria. As used in this thesis, stepwise regression is commonly used for model building. Introduced by Efroymson (1960), stepwise regression was intended to be a n automated procedure that selects the most statistically significant variables from a finite pool of independent variables. There are three separate stepwise regression procedures; forward selection, backward elimination and mixed selection. Mixed selection is the most statistically defendable type of stepwise regression, and is a mixture of the forward and backward proceduresKutneret al. (2004), Neteret al. (1996), and (Draper and Smith, 1981).

A set of best model criteria are commonly used in conjunction with stepwise regression in order to select the optimal model. As cited by Young and Guess (2000) and Young and Huber (2004), multicollinearity and heteroscedasticity can be significant problems when modeling the IB of MDF using industrial data. Young and Guess (2002) used the following best model criteria: maximum Adjusted R², parameters (p) Mallow’s Cp (Mallow 1973), minimum Akaike’s Information Criterion (AIC), Akaike(1974), Variance Inflation Factor (VIF) < 10, significance of independent variables pvalue< 0.10, absence of heteroscedasticity in residuals,

E(e ) = 0 .

For this thesis, we focus on the aforementioned criteria. We also use a pvalue< 0.05 for

significance among the independent variables. The adjusted

R² statistic,

R ² , is a better

measure of fit for MLR models built with the potential to contain significantly more independent variables than data records. As additional independent variables are added to a regression model,

R² will always increase regardless of the fit. The R ² statistic only increases if the residual sum of squares decreases (Draper and Smith 1981).The R ² statistic minimizes the risk of, and penalizes for, using too many independent variables. AIC measures the complexity of the model and guards against model bias. VIFs are reported to protect against multicollinearity, and redundancy in the model. Models with VIF < 10 can be said to be relatively free of these effects (Kutner et al. 2004).

CHAPTER THREE

METHODOLOGY

Introduction

In this chapter we introduce the process by which we analyze data to provide insight into the phenomenon under investigation rather than a prescription for final decision, which depends on the aim and objectives of the research.

Research methodology is the process or methods used to carry out a research or study. This refers to the method used to collect data or information to be used for the purpose of research. Two Statistical models are employed in this study. The first is an ordinary least squares estimation (OLS) and the second is a quantile regression.

Data Collection

The sources of data of any research are either primary or secondary or both. Primary Data: are those data, which are collected by the investigator himself for the purpose of a specific inquiry or study. Such data are original in character and are mostly generated by surveys conducted by individuals or research institutions, while Secondary Data: When an investigator uses data, which have already been collected by others, such data are called “Secondary Data”. Such data are primary data for the agency that collected them, and become secondary for someone else who uses these data for his own purposes. For this research, secondary method of data collection is used.The data used in this research comes from http://www.csus.edu/indiv/v/velianitis/ds101/schedule.htm

CHAPTER FOUR RESULTS AND DISCUSSION

Introduction

This chapter consists of the results obtained from regression analysis using OLS and QR techniques. Correlation and Stepwise regression was also examined. Criteria used for the goodness of fit of the model is coefficient of determination. All test of significance were conducted at 5% level using a statistical software package Eview, R and Statgraphics.

CHAPTER FIVE

SUMMARY CONCLUSIONS AND RECOMMENDATIONS

Introduction

In this chapter, we present summary, conclusion and recommendations based on the results obtained in the preceding chapter.

Summary

Our primary goal in this work as initially stated in our objectives is to investigate the robustness of quantile regression as an alternative to least squares regression, especially when the number of regressors increases. This thesis presented a general overview of the quantile regression method, consisting of a non-technical introduction to the basic model and its crucial features and of a short review of two major applications. We have seen that quantile regression offers an extension of univariatequantile estimation to estimation of conditional quantile functions and that it complements the established mean regression methods by adding more flexibility in the estimation sand more robustness particularly in non-Gaussian distribution settings. The covariate effects are allowed to influence location, scale and shape of the response distribution unlike conventional techniques which usually investigated location-shift paradigms. Furthermore, by focusing on local parts of the conditional distribution, quantile regression methods offer a useful deconstruction of conditional mean models.

Effort are made to model miles per gallon in highway driving using the quantile regressions approach, showing that OLS estimation is not always an appropriate method to analyze miles per gallon in highway driving. The two independent variables have been found to have an influence on the miles per gallon in highway driving. We suggest that researchers retain their list of independent variables, even if those variables are not significantly associated with the dependent (response) variable at the bivariate level, until they examine their multiple regression results for any evidence of heteroskedasticity.

QR is an invaluable tool for facing heteroskedasticity, and provides a method for modeling the rates of change in the response variable at multiple points of the distribution when such rates of change are different. It is, however, also useful in the case of homogeneous regression models outside of the classical normal regression model, and in the case where the error independence assumption is violated, as no parametric distribution assumption is required for the error distribution.

Conclusion

Quantile regression is offering a comprehensive strategy for completing the regression picture as it goes beyond this primary goal of determining only the conditional mean, and enables one to pose the question of relationship between the response variable and covariate at any quantile of the conditional distribution function. Quantile regression overcomes various problems that OLS is confronted with frequently; error terms are not constant across a distribution, thereby violating the axiom of homoscedasticity. Also, by focusing on the mean as a measure of location, information about the tails of a distribution is lost. As indicate in the data of miles per gallon in highway driving.

Recommendations

From the analysis and evaluation of the results via preceding discussions in these study so far, the following recommendations are proffered.

The performance is stable, and robust against common deviations from the model
The model should trigger reviews rather than automatic disallowances. The researcherused QR as a tool in guiding policymakers toward sound policy decisions rather than as the final determinant of policy

Contribution to knowledge

Ability to bring to limelight the advantage of quantile regression in the data analysis
This research has also help to employ the pseudo R²to identify or determine the presence of outliers in the model

Further research

Based on publicly available dataset on fuel consumption in miles per gallon in highway driving, QR model performs better compared to the OLS methods. QR can also be applied when rigid assumptions associated with OLS hold.

REFERENCES

Abreyaya, J., and Dahl, C. (2008). The Effects of Birth Inputs on Birthweight. Journal of Business & Economic Statistics, 5(2), 379-397.
Akaike, H. (1974). Factor analysis and AIC. Pschychometrika, 52, 317 – 332.
Buchinsky, M. (1998). Recent advances in quantile regression models: a practical guideline for empirical research. The Journal of Human Resources, 33(1), 88-126.
Buhai, S. (2004). Quantile regressions: overview and selected applications. Unpublished manuscript, Rotterdam Tinbergan Institute and Erasmus University.
Cade, B., and Noon., B. (2003). A gentle introduction to quantile regression for ecologists. Frontiers in Ecology and the Environment., 1(8), 412-420.
Chen, C. (2004). An introduction to Quantile Regression and the Quantreg Procedure.SUGI,30, 213-230.
Chernozhukov, V., Fernandez-Val, I., and Melly, B. (2013). Quantile and Probability Curves. Econometrica, 19(2), 2205-2268.
Cizek, P. (2003). Quantile regression in XploRe Guide. (Z. H. W. Hardle, Ed.) Berlin: Springer.
Draper, N. R., & Smith, H. (1981). Applied Regression Analysis (2nd ed.). John wiley and sons.
Efroymson, M. (1960). Multiple Regression. (A. a. Ralston, Ed.) New York, NY: John Wiley and Sons, Inc.
Fitzenberger, B., Koenker, R., & (editors), J. M. (2002). Economic Applications of Quantile Regression. New York, NY: Physica-Verlag Heidelberg.
Galvao, K., and Montes, R. (2012). Asymptotics for quantile regression model with different effect. Journal of Econometrics, 5, 76-91.

Other Topics