Analysis of Quantile Regression as Alternative to Ordinary Least Squares Regression
Chapter One
Aimย and Objective(s)ย ofย theย Study
The main aim of the study is to investigate quantile regression as an alternative to least squares regression, especially when the number of regressors increases.
- To examine the quantile regression and least squares
- To compare the models in term of goodness of fit
- To recommend a suitable model for regression
CHAPTER TWOย
LITERATUREย REVIEW
ย Introduction
Elementary statistics texts tell us that the method of least squares was first discovered aboutย 1805 (Stigler, 1986). There has been a dispute about who first discovered the method of leastย squares. It appears that it was discovered independently by Carl Friedrich Gauss (1777-1855)ย and Adrien Marie Legendre (1752-1833), that Gauss started using it before 1803 (he claimedย in about 1795, but there is no corroboration of this earlier date, and that the first account wasย published by Legendre in 1805, see(Draper & Smith, 1981). Stigler (1986) notes that Sirย Francis Galton discovered regression about 1885 in studies of heredity. Any contemporaryย courseย inย regressionย analysisย todayย startsย withย the methods ofย leastย squaresย andย its variations.
Multipleย Linearย Regression
Multiple linear regression (MLR) is one of the most commonly used data mining techniques,ย and can provide insightful information in cases where the rigid assumptions associated withย MLRย areย met. The assumptions include:
- linearity of the coefficients;
- Normal or Gaussian distribution for the response errors (ฮต); and
- The errors (ฮต)have a common distribution.
- Equal variance (homoscedasticity).
MLR is a very versatile tool and can be applied to almost any process, system, or area ofย study. Much has been published regarding this subject, and the following text may be usefulย to the reader:ย Kutneretย al (2004) asย wellย asย (Myers, 1990),ย provide thoroughย accounts ofย MLRย and will beย indispensableย forย most readers.
A key step in developing an appropriate MLR model is selecting a method of model buildingย and a set of best model criteria. As used in this thesis, stepwise regression is commonly usedย for model building. Introduced by Efroymson (1960), stepwise regression was intended to beย a n automated procedure that selects the most statistically significantย variablesย fromย a finiteย poolย of independentย variables.ย There are threeย separate stepwiseย regression procedures; forward selection, backward elimination and mixed selection. Mixedย selection is the most statistically defendable type of stepwise regression, and is a mixture ofย the forward and backward proceduresKutneret al. (2004), Neteret al. (1996), and (Draper andย Smith, 1981).
A set of best model criteria are commonly used in conjunction with stepwise regression inย orderย toย selectย theย optimalย model.ย Asย citedย byย Youngย andย Guessย (2000)ย andย Youngย andย Huberย (2004),ย multicollinearityย andย heteroscedasticity canย beย significantย problemsย whenย modeling the IB of MDF using industrial data. Young and Guess (2002) used the followingย best model criteria: maximum Adjusted R2, parameters (p) Mallowโs Cp (Mallow 1973), minimum Akaikeโs Information Criterion (AIC), Akaike(1974), Variance Inflation Factor (VIF) < 10, significance of independent variables pvalue< 0.10, absence of heteroscedasticity inย residuals,
E(eย )ย =ย 0ย .
Forย thisย thesis,ย weย focusย onย theย aforementionedย criteria.ย Weย alsoย useย aย pvalue<ย 0.05ย for
significanceย amongย theย independentย variables.ย Theย adjusted
R2ย statistic,
Rย 2ย ,ย isย aย better
measure of fit for MLR models built with the potential to contain significantly more independent variables than data records. As additional independent variables are added to a regressionย model,
R2ย willย alwaysย increaseย regardlessย ofย theย fit.ย Theย Rย 2 statisticย onlyย increases ifย theย residualย sumย ofย squaresย decreasesย (Draperย andย Smithย 1981).Theย Rย 2 statisticย minimizes the risk of, and penalizes for, using too many independent variables. AIC measures the complexity of the model and guards against model bias. VIFs are reported to protect against multicollinearity, and redundancy in the model. Models with VIF < 10 can be said to beย relativelyย freeย of theseย effects (Kutner et al. 2004).
CHAPTER THREEย
METHODOLOGY
Introduction
In this chapter we introduce the process by which we analyze data to provide insight into theย phenomenon under investigation rather than a prescription for final decision, which dependsย onย the aim and objectivesย of theย research.
Research methodology is the process or methods used to carry out a research or study. Thisย refers to the method used to collect data or information to be used for the purpose of research.ย Two Statistical models are employed in this study. The first is an ordinary least squaresย estimationย (OLS)ย and theย second isย aย quantile regression.
Dataย Collection
The sources of data of any research are either primary or secondary or both.ย Primary Data:ย are those data, which are collected by the investigator himself for the purpose of a specificย inquiry or study. Such data are original in character and are mostly generated by surveysย conductedย byย individualsย orย researchย institutions,ย whileย Secondaryย Data:ย Whenย anย investigator uses data, which have already been collected by others, such data are calledย “Secondary Data”. Such data are primary data for the agency that collected them, and becomeย secondary for someone else who uses these data for his own purposes. For this research,ย secondary method of data collection is used.The data used in this researchย comes fromย http://www.csus.edu/indiv/v/velianitis/ds101/schedule.htm
CHAPTER FOURย RESULTSย ANDย DISCUSSION
Introduction
This chapter consists of the results obtained from regression analysis using OLS and QRย techniques. Correlation and Stepwise regression was also examined. Criteria used for theย goodness of fit of the model is coefficient of determination. All test of significance wereย conductedย at 5% level usingย aย statistical software packageย Eview,ย R and Statgraphics.
CHAPTERย FIVE
SUMMARYย CONCLUSIONSย ANDย RECOMMENDATIONS
Introduction
In this chapter, we present summary, conclusion and recommendations based on the resultsย obtainedย in theย preceding chapter.
Summary
Ourย primary goalย inย thisย workย asย initially statedย inย ourย objectivesย isย toย investigateย theย robustness of quantile regression as an alternative to least squares regression, especially whenย the number of regressors increases. This thesis presented a general overview of the quantileย regression method, consisting of a non-technical introduction to the basic model and itsย crucial features and of a short review of two major applications. We have seen that quantileย regression offers an extension of univariatequantile estimation to estimation of conditionalย quantileย functionsย andย thatย itย complementsย theย establishedย meanย regressionย methodsย byย adding more flexibility in the estimation sand more robustness particularly in non-Gaussianย distribution settings. The covariate effects are allowed to influence location, scale and shapeย ofย theย responseย distributionย unlikeย conventionalย techniquesย whichย usuallyย investigatedย location-shiftย paradigms.ย Furthermore,ย byย focusingย onย localย partsย ofย theย conditionalย distribution, quantile regression methods offer a useful deconstruction of conditional meanย models.
Effort are made to model miles per gallon in highway driving using the quantile regressionsย approach, showing that OLS estimation is not always an appropriate method to analyze milesย per gallon in highway driving. The two independent variables have been found to have anย influence on the miles per gallon in highway driving. We suggest that researchers retain theirย list of independent variables, even if those variables are not significantly associated with theย dependentย (response)ย variableย atย theย bivariateย level,ย untilย theyย examineย theirย multipleย regressionย results forย anyย evidenceย of heteroskedasticity.
QR is an invaluable tool for facing heteroskedasticity, and provides a method forย modelingย the rates of change in the response variable at multiple points of the distribution when suchย ratesย ofย changeย areย different.ย Itย is,ย however,ย alsoย usefulย inย theย caseย ofย homogeneousย regression models outside of the classical normal regression model, and in the case where theย errorย independenceย assumptionย isย violated,ย asย noย parametricย distributionย assumptionย isย requiredย for theย errorย distribution.
Conclusion
Quantileย regressionย isย offeringย aย comprehensiveย strategyย forย completingย theย regressionย picture as it goes beyond this primary goal of determining only the conditional mean, andย enablesย oneย toย poseย theย questionย ofย relationshipย betweenย theย responseย variableย andย covariateย atย anyย quantileย ofย theย conditionalย distributionย function.ย Quantileย regressionย overcomesย variousย problemsย thatย OLSย isย confrontedย withย frequently;ย errorย termsย areย notย constantย acrossย a distribution, thereby violating theย axiomย of homoscedasticity.ย Also,ย by focusing on theย mean as a measure of location, information about the tails of a distribution is lost. As indicateย inย theย data ofย miles perย gallon in highwayย driving.
Recommendations
From the analysis and evaluation of the results via preceding discussions in these study so far,ย theย followingย recommendations areย proffered.
- The performance is stable, and robust against common deviations from the model
- The model should trigger reviews rather than automatic disallowances. The researcherused QR as a tool in guiding policymakers toward sound policy decisions rather thanย asย theย final determinant of policy
ย Contributionย toย knowledge
- Ability to bring to limelight the advantage of quantile regression in the data analysis
- This research has also help to employ the pseudo R2to identify or determine theย presenceย of outliers in theย model
Furtherย research
Based on publicly available dataset on fuel consumption in miles per gallon in highwayย driving, QR model performs better compared to the OLS methods. QR can also be appliedย whenย rigid assumptions associated withย OLS hold.
REFERENCES
- Abreyaya, J., and Dahl, C. (2008). The Effects of Birth Inputs on Birthweight. Journal ofย Businessย &ย Economicย Statistics, 5(2), 379-397.
- Akaike,ย H.ย (1974).ย Factor analysisย andย AIC.ย Pschychometrika,ย 52,ย 317ย –ย 332.
- Buchinsky,ย M.ย (1998).ย Recentย advancesย inย quantileย regressionย models:ย aย practicalย guidelineย forย empiricalย research.ย Theย Journal of Human Resources,ย 33(1), 88-126.
- Buhai, S. (2004). Quantile regressions: overview and selected applications. Unpublishedย manuscript,ย Rotterdam Tinberganย Instituteย andย Erasmus University.
- Cade, B., and Noon., B. (2003). A gentle introduction to quantile regression for ecologists.ย Frontiersย inย Ecologyย andย theย Environment.,ย 1(8),ย 412-420.
- Chen, C. (2004). An introduction to Quantile Regression and the Quantregย Procedure.SUGI,30,ย 213-230.
- Chernozhukov, V., Fernandez-Val, I., and Melly, B. (2013). Quantile and Probability Curves.ย Econometrica,ย 19(2),ย 2205-2268.
- Cizek, P. (2003). Quantile regression in XploRe Guide. (Z. H. W. Hardle, Ed.) Berlin:ย Springer.
- Draper, N. R., & Smith, H. (1981). Applied Regression Analysis (2nd ed.). John wiley andย sons.
- Efroymson, M. (1960). Multiple Regression. (A. a. Ralston, Ed.) New York, NY: John Wileyย andย Sons,ย Inc.
- Fitzenberger, B., Koenker, R., & (editors), J. M. (2002). Economic Applications of Quantileย Regression.ย New York,ย NY: Physica-Verlagย Heidelberg.
- Galvao,ย K.,ย andย Montes,ย R.ย (2012).ย Asymptoticsย forย quantileย regressionย modelย withย differentย effect.ย Journal of Econometrics, 5, 76-91.