Rating: **4.1**/5.0 (29 Votes)

Category: Essay

Here are a couple of fairly common scenarios producing perfect multicollinearity, i.e. situations in which the columns of the design matrix are linearly dependent. Recall from linear algebra that this means there is a linear combination of columns of the design matrix (whose coefficients are not all zero) which equals zero. I have included some practical examples to help explain *why* this pitfall strikes so often — I have encountered almost all of them!

*One variable is a multiple of another*. regardless of whether there is an intercept term: perhaps because you have recorded the same variable twice using different units (e.g. "length in centimetres" is precisely 100 times larger than "length in metres") or because you have recorded a variable once as a raw number and once as a proportion or percentage, when the denominator is fixed (e.g. "area of petri dish colonized" and "percentage of petri dish colonized" will be exact multiples of each other if the area of each petri dish is the same). We have collinearity because if $w_i = ax_i$ where $w$ and $x$ are variables (columns of your design matrix) and $a$ is a scalar constant, then $1(\vec w) - a(\vec x)$ is a linear combination of variables that is equal to zero.

*There is an intercept term and one variable differs from another by a constant*. this will happen if you center a variable ($w_i = x_i - \bar x$) and include both raw $x$ and centered $w$ in your regression. It will also happen if your variables are measured in different unit systems that differ by a constant, e.g. if $w$ is "temperature in kelvin" and $x$ as "temperature in °C" then $w_i = x_i + 273.15$. If we regard the intercept term as a variable that is always $1$ (represented as a column of ones, $\vec 1_n$, in the design matrix) then having $w_i = x_i + k$ for some constant $k$ means that $1(\vec w) - 1(\vec x) - k(\vec 1_n)$ is a linear combination of the $w$, $x$ and $1$ columns of the design matrix that equals zero.

*There is an intercept term and one variable is given by an affine transformation of another*. i.e. you have variables $w$ and $x$, related by $w_i = ax_i + b$ where $a$ and $b$ are constants. For instance this happens if you standardize a variable as $z_i = \frac

*There is an intercept term and the sum of several variables is fixed (e.g. in the famous "dummy variable trap")*. for example if you have "percentage of satisfied customers", "percentage of dissatisfied customers" and "percentage of customers neither satisfied nor dissatisfied" then these three variables will always (barring rounding error) sum to 100. One of these variables — or alternatively, the intercept term — needs to be dropped from the regression to prevent collinearity. The "dummy variable trap" occurs when you use indicator variables (more commonly but less usefully called "dummies") for every possible level of a categorical variable. For instance, suppose vases are produced in red, green or blue color schemes. If you recorded the categorical variable "color" by three indicator variables ( red. green and blue would be binary variables, stored as 1 for "yes" and 0 for "no") then for each vase only one of the variables would be a one, and hence red + green + blue = 1. Since there is a vector of ones for the intercept term, the linear combination 1(red) + 1(green) + 1(blue) - 1(1) = 0. The usual remedy here is either to drop the intercept, or drop one of the indicators (e.g. leave out red ) which becomes a baseline or reference level. In this case, the regression coefficient for green would indicate the change in the mean response associated with switching from a red vase to a green one, holding other explanatory variables constant.

*There are at least two subsets of variables, each having a fixed sum*. regardless of whether there is an intercept term: suppose the vases in (4) were produced in three sizes, and the categorical variable for size was stored as three additional indicator variables. We would have large + medium + small = 1. Then we have the linear combination 1(large) + 1(medium) + 1(small) - 1(red) - 1(green) - 1(blue) = 0. even when there is no intercept term. The two subsets need not share the same sum, e.g. if we have explanatory variables $u, v, w, x$ such that every $u_i + v_i = k_1$ and $x_i + y_i = k_2$ then $k_2(\vec u) + k_2(\vec v) - k_1(\vec w) - k_1(\vec x) = \vec 0$.

*One variable is defined as a linear combination of several other variables*. for instance, if you record the length $l$, width $w$ and perimeter $p$ of each rectangle, then $p_i = 2l_i + 2w_i$ so we have the linear combination $1(\vec p) - 2(\vec l) - 2(\vec w) = \vec 0$. An example with an intercept term: suppose a mail-order business has two product lines, and we record that order $i$ consisted of $u_i$ of the first product at unit cost $\$a$ and $v_i$ of the second at unit cost $\$b$, with fixed delivery charge $\$c$. If we also include the order cost $\$x$ as an explanatory variable, then $x_i = a u_i + b v_i + c$ and so $1(\vec x) - a(\vec u) - b(\vec v) -c(\vec 1_n) = \vec 0$. This is an obvious generalization of (3). It also gives us a different way of thinking about (4): once we know all bar one of the subset of variables whose sum is fixed, then the remaining one is their complement so can be expressed as a linear combination of them and their sum. If we know 50% of customers were satisfied and 20% were dissatisfied, then 100% - 50% - 20% = 30% must be neither satisfied nor dissatisfied; if we know the vase is not red ( red=0 ) and it is green ( green=1 ) then we know it is not blue ( blue = 1(1) - 1(red) - 1(green) = 1 - 0 - 1 = 0 ).

*One variable is constant and zero*. regardless of whether there is an intercept term: in an observational study, a variable will be constant if your sample does not exhibit sufficient (any!) variation. There may be variation in the population that is not captured in your sample, e.g. if there is a very common modal value: perhaps your sample size is too small and was therefore unlikely to include any values that differed from the mode, or your measurements were insufficiently accurate to detect small variations from the mode. Alternatively, there may be theoretical reasons for the lack of variation, particularly if you are studying a sub-population. In a study of new-build properties in Los Angeles, it would not be surprising that every data point has AgeOfProperty = 0 and State = California. In an experimental study, you may have measured an independent variable that is under experimental control. Should one of your explanatory variables $x$ be both constant and zero, then we have immediately that the linear combination $1(\vec x)$ (with coefficient zero for any other variables) is $\vec 0$.

*There is an intercept term and at least one variable is constant*. if $x$ is constant so that each $x_i = k \neq 0$, then the linear combination $1(\vec x) - k(\vec 1_n) = \vec 0$.

*At least two variables are constant*. regardless of whether there is an intercept term: if each $w_i = k_1 \neq 0$ and $x_i = k_2 \neq 0$, then the linear combination $k_2(\vec w) - k_1(\vec x) = \vec 0$.

*Number of columns of design matrix, $k$, exceeds number of rows, $n$*. even when there is no conceptual relationship between your variables, it is mathematically necessitated that the columns of your design matrix will be linearly dependent when $k > n$. It simply isn't possible to have $k$ linearly independent vectors in a space with a number of dimensions lower than $k$: for instance, while you can draw two independent vectors on a sheet of paper (a two-dimensional plane, $\mathbb R^2$) any further vector drawn on the page must lie within their span, and hence be a linear combination of them. Note that an intercept term contributes a column of ones to the design matrix, so counts as one of your $k$ columns. (This scenario is often called the "large $p$, small $n$" problem: see also this related CV question .)

*Data examples with R code*

Each example gives a design matrix $X$, the matrix $X'X$ (note this is always square and symmetrical) and $\det (X'X)$. Note that if $X'X$ is singular (zero determinant, hence not invertible) then we cannot estimate $\hat \beta = (X'X)^<-1>X'y$. The condition that $X'X$ be non-singular is equivalent to the condition that $X$ has full rank so its columns are linearly independent: see this Math SE question. or this one and its converse .

(1) One column is multiple of another

(2) Intercept term and one variable differs from another by constant

(3) Intercept term and one variable is affine transformation of another

(4) Intercept term and sum of several variables is fixed

(4a) Intercept term with dummy variable trap

(5) Two subsets of variables with fixed sum

(6) One variable is linear combination of others

(7) One variable is constant and zero

(8) Intercept term and one constant variable

(9) Two constant variables

- $\mathbf
$ is height in centimeters. $\mathbf $ is height in meters. Then: - $\mathbf
= 100 \mathbf $, and your design matrix $X$ will not have linearly independent columns.

- $\mathbf
- $\mathbf
= \mathbf<1>$ (i.e. you include a constant in your regression), $\mathbf $ is temperature in fahrenheit, and $\mathbf $ is temperature in celsius. Then: - $\mathbf
= \frac<9><5>\mathbf + 32 \mathbf $, and your design matrix $X$ will not have linearly independent columns.

- $\mathbf
- Everyone starts school at age 5, $\mathbf
= \mathbf<1>$ (i.e. constant value of 1 across all observations), $\mathbf $ is years of schooling, $\mathbf $ is age, and no one has left school. Then: - $\mathbf
= \mathbf - 5\mathbf $, and your design matrix $X$ will not have linearly independent columns.

- $\mathbf

There a multitude of ways such that one column of data will be a linear function of your other data. Some of them are obvious (eg. meters vs. centimeters) while others can be more subtle (eg. age and years of schooling for younger children).

Notational notes: Let $\mathbf

answered Jul 5 at 15:04

Multicollinearity can adversely affect your regression results.

Multicollinearity generally occurs when there are high correlations between two or more predictor variables. In other words, one predictor variable can be used to predict the other. This creates redundant information, skewing the results in a regression model. Examples of correlated predictor variables (also called multicollinear predictors) are: a person’s height and weight, age and sales price of a car, or years of education and annual income.

An easy way to detect multicollinearity is to calculate correlation coefficients for all pairs of predictor variables. If the correlation coefficient, r, is exactly +1 or -1, this is called perfect multicollinearity. If r is close to or exactly -1 or +1, one of the variables should be removed from the model if at all possible.

It’s more common for multicollineariy to rear its ugly head in observational studies; it’s less common with experimental data. When the condition is present, it can result in unstable and unreliable regression estimates. Several other problems can interfere with analysis of results, including:

- The t-statistic will generally be very small and coefficient confidence intervals will be very wide. This means that it is harder to reject the null hypothesis .
- The partial regression coefficient may be an imprecise estimate; standard errors may be very large.
- Partial regression coefficients may have sign and/or magnitude changes as they pass from sample to sample.
- Multicollinearity makes it difficult to gauge the effect of independent variables on dependent variables .

The two types are:

*Data-based multicollinearity:*caused by poorly designed experiments, data that is 100% observational, or data collection methods that cannot be manipulated. In some cases, variables may be highly correlated (usually due to collecting data from purely observational studies) and there is no error on the researcher’s part. For this reason, you should conduct experiments whenever possible, setting the level of the predictor variables in advance.*Structural multicollinearity*. caused by you, the researcher, creating new predictor variables.

Causes for multicollinearity can also include:

*Insufficient data*. In some cases, collecting more data can resolve the issue.*Dummy variables*may be incorrectly used. For example, the researcher may fail to exclude one category, or add a dummy variable for every category (e.g. spring, summer, autumn, winter).- Including
*a variable in the regression that is actually a combination of two other variables*. For example, including “total investment income” when total investment income = income from stocks and bonds + income from savings interest. *Including two identical (or almost identical) variables*. For example, weight in pounds and weight in kilos, or investment income and savings/bond income.

Multicollinearity: Definition, Causes, Examples was last modified: June 27th, 2016 by Andale

In the REGRESSION procedure for linear regression analysis, I can request statistics that are diagnostic for multicollinearity (or, simply, collinearity). How can I detect collinearity with the LOGISTIC REGRESSION, Nominal Regression (NOMREG), or Ordinal Regression (PLUM) procedures?

The regression procedures for categorical dependent variables do not have collinearity diagnostics. However, you can use the linear Regression procedure for this purpose. Collinearity statistics in regression concern the relationships among the predictors, ignoring the dependent variable. So, you can run REGRESSION with the same list of predictors and dependent variable as you wish to use in LOGISTIC REGRESSION (for example) and request the collinearity diagnostics. Run Logistic Regression to get the proper coefficients, predicted probabilities, etc. after you've made any necessary decisions (dropping predictors, etc.) that result from the collinearity analysis.

If you have categorical predictors in your model, you will need to transform these to sets of dummy variables to run collinearity analysis in REGRESSION, which does not have a facility for declaring a predictor to be categorical. Technote #1476169, which is titled "Recoding a categorical SPSS variable into indicator (dummy) variables", discusses how to do this.

An enhancement request has been filed to request that collinearity diagnostics be added as options to other procedures, including Logistic Regression, NOMREG, and PLUM.

and Orthogonalization in Regression

Collinearity happens to many inexperienced researchers. A common mistake is to put too many regressors into the model. As what I explained in my example of "fifty ways to improve your grade, " inevitably many of those independent variables will be too correlated. In addition, when there are too many variables in a regression model i.e. the number of parameters to be estimated is larger than the number of observations, this model is said to be lack of degree of freedom and thus *over-fitting*. The following cases are extreme, but you will get the idea. When there is one subject only, the regression line can be fitted in any way (left figure). When there are two observations, the regression line is a perfect fit (right figure). When things are perfect, they are indeed imperfect!

One common approach to select a subset of variables from a complex model is stepwise regression. A stepwise regression is a procedure to examine the impact of each variable to the model step by step. The variable that cannot contribute much to the variance explained would be thrown out. There are several versions of stepwise regression such as *forward selection*. *backward elimination*. and *stepwise*. Many researchers employed these techniques to determine the order of predictors by its magnitude of influence on the outcome variable (e.g. June, 1997; Leigh, 1996).

However, the above interpretation is valid if and only if all predictors are independent (But if you write a dissertation, it doesn't matter. Follow what your committee advises). Collinear regressors or regressors with some degree of correlation would return inaccurate results. Assume that there is a Y outcome variable and four regressors X_{1} -X_{4}. In the left panel X_{1} -X_{4} are correlated (non-orthogonal). We cannot tell which variable contributes the most of the variance explained individually. If X_{1} enters the model first, it seems to contribute the largest amount of variance explained. X_{2} seems to be less influential because its contribution to the variance explained has been overlapped by the first variable, and X_{3} and X_{4} are even worse.

Indeed, the more correlated the regressors are, the more their ranked "importance" depends on the selection order (Bring, 1996). However, we can interpret the result of step regression as an indication of the importance of independent variables if all predictors are orthogonal. In the right panel we have a "clean" model. The individual contribution to the variance explained by each variable to the model is clearly seen. Thus, we can assert that X_{1} and X_{4} are more influential to the dependent variable than X_{2} and X_{3}.

There are other better ways to perform variable selection such as Maximum R-square, Root Mean Square Error (RMSE), and Mallow's Cp. Max. R-square is a method of variable selection by examining the best of n-models based upon the largest variance explanied. The other two are opposite to max. R-square. RMSE is a measure of the lack of fit while Mallow's CP is the total square errors, as opposed to the best fit by max. R-square. Thus, the higher the R-square is, the better the model is. The lower the RMSQ and Cp are, the better the model is.

For the clarity of illustration, I use only three regressors: X_{1}. X_{2}. X_{3}. The principle illustrated here can be well-applied to the situation of many regressors. The following output is based on a hypothetical dataset:

At first, each regressor enters the model one by one. In all one-variable models, the best variable is X_{3} according to the max. R-square criterion (R 2 =.31). (Now we temporarily ignore RMSE and Cp). Then, all combinations of two-variable models are computed. This time the best two predictors are X_{2} and X_{3} (R 2 =.60). Last, all three variables are used for a full model (R 2 =.62). From the one-variable model to the two-variable model, the variance explained gains a substantive improvement (.60 - .31 = .29). However, from the two-variable to the full model, the gain is trivial (.62 - .60 = .02).

If you cannot follow the above explanation, this figure may help you. The x-axis represents the number of variables while the y-axis represents the R-square. It clearly indicates a sharp jump from one to two. But the curve turns into flat from two to three (see the red arrow).

Now, let's examine RMSE and Cp. Interestingly enough, in terms of both RMSE and Cp, the full model is worse than the two-variable model. The RMSE of the best two-variable is 1.81 but that of the full model is 1.83 (see the red arrow in the right panel)! The Cp of the best two is 2.70 whereas that of the full model is 4.00 (see the red arrow in the following figure)!

Nevertheless, although the approaches of maximum R-square, Root Mean Square Error, and Mallow's Cp are different, the conclusion is the same: One is too few and three are too many. To perform a variable selection in SAS, the syntax is "PROC REG; MODEL Y=X1-X3 /SELECTION=MAXR". To plot Max. R-square, RMSQ, and Cp together, use NCSS (NCSS Statistical Software, 1999).

Although the result of stepwise regression depends on the order of entering predictors, JMP (SAS Institute, 2010) allows the user to select or deselect variables in any order. The process is so interactive that the analyst can easily determine whether certain variables should be kept or dropped. In addition to Mallows' CP, JMP shows Akaike's information criterion correction (AICc) to indicate the balance between fitness and simplicity of the model.

The original Akaike's information criterion (AIC) without correction, developed by Hirotsugu Akaike (1973), is in alignment with Ockham’s razor: Given all things being equal, the simplest model tends to be the best one; and simplicity is a function of the number of adjustable parameters. Thus, a smaller AIC suggests a "better" model. Specifically, AIC is a fitness index for trading off the complexity of a model against how well the model fits the data. The general form of AIC is: AIC = 2k – 2lnL where k is the number of parameters and L is the likelihood function of the estimated parameters. Increasing the number of free parameters to be estimated improves the model fitness, however, the model might be unnecessarily complex. To reach a balance between fitness and parsimony, AIC not only rewards goodness of fit, but also includes a penalty that is an increasing function of the number of estimated parameters. This penalty discourages over-fitting and complexity. Hence, the “best” model is the one with the lowest AIC value. Since AIC attempts to find the model that best explains the data with a minimum of free parameters, it is considered an approach favoring simplicity. In this sense, AIC is better than R-squared and adjusted R-squared, which always go up as additional variables enter in the model. Needless to say, this approach favors complexity. However, AIC does not necessarily change by adding variables. Rather, it varies based upon the composition of the predictors and thus it is a better indicator of the model quality (Faraway, 2005).

AICc is a further step beyond AIC in the sense that AICs imposes a greater penalty for additional parameters. The formula of AICs is:

where n = sample size and k = the number of parameters to be estimated.

Burnham and Anderson (2002) recommend replacing AIC with AICc, especially when the sample size is small and the number of parameters is large. Actually, AICc converges to AIC as the sample size is getting larger and larger. Hence, AICc should be used regardless of sample size and the number of parameters.

Bayesian information criterion (BIC) is similar to AIC, but its penalty is heavier than that of AIC. However, some authors believe that AIC and AICc are superior to BIC for a number of reasons. First, AIC and AICc is based on the principle of information gain. Second, the Bayesian approach requires a prior input but usually it is debatable. Third, AIC is asymptotically optimal in model selection in terms of the least squared mean error, but BIC is not asymptotically optimal (Burnham & Anderson, 2004; Yang, 2005).

JMP provides the users with the options of AICc and BIC for model refinement. To start running stepwise regression with AICc or BIC, use *Fit models* and then choose *Stepwise* from *Personality*. These short movie clips show the first and the second steps of constructing an optimal regression model with AICc (Special thanks to Michelle Miller for her help in recording the movie clips).

Besides regression, AIC and BIC are also used in many other statistical procedures for model selection (e.g. structural equation modeling). While degree of model fitness is a continuum, the cutoff points of conventional fitness indices force researchers to make a dichotomous decision. To rectify the situation, Suzanne and Preston (2015) suggested replacing arbitrary cutoffs with Akaike Information Criterion (1973) and Bayesian Information Criterion (BIC). It is important to emphasize that unlike conventional fitness indices, there is no cutoff in AIC or BIC. Rather, the researcher explores different alternate models and then select the best fit based on the least AIC or BIC.

There are other ways to reduce the number of variables such as factor analysis, principal component analysis and partial least squares. The philosophy behind these methods is very different from variable selection methods. In the former group of procedures "redundant" variables are not excluded. Rather they are retained and combined to form latent factors. It is believed that a construct should be an "open concept" that is triangulated by multiple indicators instead of a single measure (Salvucci, Walter, Conley, Fink, & Saba, 1997). In this sense, redundancy enhances reliability and yields a better model.

However, factor analysis and principal component analysis do not have the distinction between dependent and independent variables and thus may not be applicable to research with the purpose of regression analysis. One way to reduce the number of variables in the context of regression is to employ the partial least squares (PLS) procedure. PLS is a method for constructing predictive models when the variables are too many and highly collinear (Tobias, 1999). Besides collinearity, PLS is also robust against other data structural problems such as skew distributions and omission of regressors (Cassel, Westlund, & Hackl, 1999). It is important to note that in PLS the emphasis is on prediction rather than explaining the underlying relationships between the variables. Thus, although some program (e.g. JMP) names the variables as "factors," indeed they are a-theoretical principal components.

Like principal component analysis, the basic idea of PLS is to extract several latent factors and responses from a large number of observed variables. Therefore, the acronym PLS is also taken to mean *projection to latent structure*. The slide show below illustrates the idea of factor extraction. Please press the next button to start the slide show (this Macromedia flash slideshow is made by Gregory Van Eekhout):

The following is an example of the SAS code for PLS: PROC PLS; MODEL; y1-y5 = x1-x100; Note that unlike an ordinary least squares regression, PLS can accept multiple dependent variables. The output shows the percent variation accounted for each extracted latent variable:

In addition to the partial least-square method, a modeler can also use generalized regression modeling (GRM) as a remedy to the threat of multicollinearity. GRM, which is available in JMP, offers four options, namely, maximum likelihood, Lasso, Ridge, and Adaptive Elastic Net, to perform variable selection. The basic idea of GRM is very simple: using penalty to avoid model complexity. Among the preceding four options, adaptive elastic net is considered the best in most situations because it combines the strength of Lasso and Ridge. The following is a typical GRM output.

*Update: July 2016*

I hope that this one is not going to be "ask-and-answer" question. here goes: (multi)collinearity refers to extremely high correlations between predictors in the regression model. How to cure them. well, sometimes you don't need to "cure" collinearity, since it doesn't affect regression model itself, but interpretation of an effect of individual predictors.

One way to spot collinearity is to put each predictor as a dependent variable, and other predictors as independent variables, determine R 2. and if it's larger than .9 (or .95), we can consider predictor redundant. This is one "method". what about other approaches? Some of them are time consuming, like excluding predictors from model and watching for b-coefficient changes - they should be noticeably different.

Of course, we must always bear in mind the specific context/goal of the analysis. Sometimes, only remedy is to repeat a research, but right now, I'm interested in various ways of screening redundant predictors when (multi)collinearity occurs in a regression model.

asked Jun 15 '10 at 2:10

Just to add to what Dirk said about the Condition Number method, a rule of thumb is that values of CN > 30 indicate severe collinearity. Other methods, apart from condition number, include:

1) the determinant of the covariance matrix which ranges from 0 (Perfect Collinearity) to 1 (No Collinearity)

2) Using the fact that the determinant of a diagonal matrix is the product of the eigenvalues => The presence of one or more small eigenvalues indicates collinearity

3) The value of the Variance Inflation Factor (VIF). The VIF for predictor i is 1/(1-R_i^2), where R_i^2 is the R^2 from a regression of predictor i against the remaining predictors. Collinearity is present when VIF for at least one independent variable is large. Rule of Thumb: VIF > 10 is of concern. For an implementation in R see here. I would also like to comment that the use of R^2 for determining collinearity should go hand in hand with visual examination of the scatterplots because a single outlier can "cause" collinearity where it doesn't exist, or can HIDE collinearity where it exists.

answered Jun 15 '10 at 8:23

It succinctly lists many useful regression related functions in R including diagnostic functions. In particular, it lists the vif function from the car package which can assess multicollinearity. http://en.wikipedia.org/wiki/Variance_inflation_factor

Consideration of multicollinearity often goes hand in hand with issues of assessing variable importance. If this applies to you, perhaps check out the relaimpo package: http://prof.beuth-hochschule.de/groemping/relaimpo/

Since there is no mention of VIF so far, I will add my answer. Variance Inflation Factor>10 usually indicates serious redundancy between predictor variables. VIF indicates the factor by which variance of the co-efficient of a variable would increase if it was not highly correlated with other variables.

vif() is available in package cars and applied to an object of class(lm). It returns the vif of x1, x2. xn in object lm(). It is a good idea to exclude variables with vif >10 or introduce transformations to the variables with vif>10.

answered Jul 25 '14 at 20:50

actually, this is mentioned in several other answers. – Ben Bolker Jul 25 '14 at 20:52

crap just noticed. glad I didn't get down voted for that! – vagabond Jul 25 '14 at 20:58

2016 Stack Exchange, Inc

1. What is the nature of multicollinearity?

2. Is multicollinearity really a problem?

3. What are the theoretical consequences of multicollinearity?

4. What are the practical consequences of multicollinearity?

5. In practice, how does one detect multicollinearity?

6. If it is desirable to eliminate the problem of multicollinearity, what remedial measures are available?

In cases of perfect linear relationship or perfect multicollinearity among explanatory variables, we cannot obtain unique estimates of all parameters. Since we cannot obtain unique estimates, we cannot draw any statistical inferences about them from a given sample. Estimation and hypothesis testing about individual regression coefficients are therefore not possible. It is a dead end issue.

2. NEAR OR IMPERFECT MULTICOLLINEARITY

Perfect multicollinearity seldom arises with actual data. Its occurance often results from correctable mistakes such as the dummy variable trap, or including variables such as 1n(x) and in the same equation. Once spotted, corrections can be made. The real problem is with imperfect multicollinearity.

Multicollinearity is not a condition that either exists or does not exist in economic functions, but rather a phenomenon inherent in most relationships due to the nature of economic magnitudes. It can arise because there is a tendency for economic variables to move together over time. Also, there is an increasing tendency to use lagged variables of some explanatory variables e.g. in investment functions, distributed lags concerning past levels of economic activity are introduced.

We will get results, but are they to be believed? The determinant of will exist and hence its inverse. Can see that the inverse will have very large elements which will lead to very large values on the diagonal of the variance covariance matrix for the estimated coefficients i.e. large.

*Multicollinearity* is a state of very high intercorrelations or inter-associations among the independent variables. It is therefore a type of disturbance in the data, and if present in the data the statistical inferences made about the data may not be reliable.

*There are certain reasons why multicollinearity occurs:*

- It is caused by an inaccurate use of dummy variables.
- It is caused by the inclusion of a variable which is computed from other variables in the data set.
- Multicollinearity can also result from the repetition of the same kind of variable.
- Generally occurs when the variables are highly correlated to each other.

*Multicollinearity can result in several problems. These problems are as follows:*

- The partial regression coefficient due to multicollinearity may not be estimated precisely. The standard errors are likely to be high.
- Multicollinearity results in a change in the signs as well as in the magnitudes of the partial regression coefficients from one sample to another sample.
- Multicollinearity makes it tedious to assess the relative importance of the independent variables in explaining the variation caused by the dependent variable.

In the presence of high multicollinearity, the confidence intervals of the coefficients tend to become very wide and the statistics tend to be very small. It becomes difficult to reject the null hypothesis of any study when multicollinearity is present in the data under study.

*There are certain signals which help the researcher to detect the degree of multicollinearity.*

One such signal is if the individual outcome of a statistic is not significant but the overall outcome of the statistic is significant. In this instance, the researcher might get a mix of significant and insignificant results that show the presence of multicollinearity.Suppose the researcher, after dividing the sample into two parts, finds that the coefficients of the sample differ drastically. This indicates the presence of multicollinearity. This means that the coefficients are unstable due to the presence of multicollinearity. Suppose the researcher observes drastic change in the model by simply adding or dropping some variable. This also indicates that multicollinearity is present in the data.

Multicollinearity can also be detected with the help of tolerance and its reciprocal, called variance inflation factor (VIF). If the value of tolerance is less than 0.2 or 0.1 and, simultaneously, the value of VIF 10 and above, then the multicollinearity is problematic.

Multicollinearity means independent variables are highly correlated to each other. In regression analysis, it's an important assumption that regression model should not be faced with a problem of multicollinearity.

*Why is multicollinearity a problem?*

If the purpose of the study is to see how independent variables impact dependent variable, then multicollinearity is a big problem.

*If two explanatory variables are highly correlated, it's hard to tell which has an effect on the dependent variable.*

Lets say, Y is regressed against X1 and X2 and where X1 and X2 are highly correlated. Then the effect of X1 on Y is hard to distinguish from the effect of X2 on Y because any increase in X1 tends to be associated with an increase in X2.

Another way to look at multicollinearity problem is. Individual t-test P values can be misleading. It means a P value can be high which means variable is not important, even though the variable is important.

*When multicollinearity is not a problem?*

- If your goal is simply to predict Y from a set of X variables, then multicollinearity is not a problem. The predictions will still be accurate, and the overall R2 (or adjusted R2) quantifies how well the model predicts the Y values.
- Multiple dummy (binary) variables that represent a categorical variable with three or more categories.

*How to detect multicollinearity?*

*Variance Inflation Factor (VIF) -* It provides an index that measures how much the variance (the square of the estimate's standard deviation) of an estimated regression coefficient is increased because of collinearity.

VIF = 1 / (1-R-Square of j-th variable) where R2 of jth varible is the coefficient of determination of the model that includes all independent variables except the jth predictor.

Where R-Square of j-th variable is the multiple R2 for the regression of Xj on the other independent variables (a regression that does not involve the dependent variable Y).

If VIF > 5 then there is a problem with multicollinearity.

*Interpretation of VIF*

If the variance inflation factor of a predictor variable is 5 this means that variance for the coefficient of that predictor variable is 5 times as large as it would be if that predictor variable were uncorrelated with the other predictor variables.

In other words, if the variance inflation factor of a predictor variable is 5 this means that the standard error for the coefficient of that predictor variable is 2.23 times (√5 = 2.23) as large as it would be if that predictor variable were uncorrelated with the other predictor variables.

- Remove one of highly correlated independent variable from the model. If you have two or more factors with a high VIF, remove one from the model.
*Principle Component Analysis (PCA)*- It cut the number of interdependent variables to a smaller set of uncorrelated components. Instead of using highly correlated variables, use components in the model that have eigenvalue greater than 1.- Run
*PROC VARCLUS*and choose variable that has*minimum (1-R2) ratio*within a cluster. *Ridge Regression -*It is a technique for analyzing multiple regression data that suffer from multicollinearity.- If you include an interaction term (the product of two independent variables), you can also reduce multicollinearity by "centering" the variables. By "centering", it means subtracting the mean from the independent variables values before creating the products.

*Second Step :* Center_Height2 = Height2 - mean(Height2)*Third Step :* Center_Height3 = Center_Height * Center_Height2

*If you want me to keep writing this site, please post your feedback in the comment box below. While I love having friends who agree, I only learn from those who don't. To hire me for services or advertising, you may contact me at deepanshu.bhalla@outlook.com*