Function kappa()
df<-() df_cor=cor(df) kappa(df_cor, exact=T)
When κ<100κ<100, it means that the degree of collinearity is small;
When 100<κ<1000100<κ<1000, there is strong multicollinearity;
When κ>1000κ>1000, there is severe multicollinearity.
Function qr()
x<-matrix() qr(x)$rank
qr(X)$rank calculates the rank of the X matrix. If it is not full rank, it means that xixi can be represented by other linear combinations of xjxj; at this time, step regression can be performed and step() command is used.
fm<-lm() step(fm)
Supplementary: Causes, discrimination, inspection, and solutions for multicollinearity
Recently, regression analysis has occurred, and the problem of correlation coefficients opposite to the coefficient symbols of the regression equations has occurred. After research, it has been confirmed that it is a multicollinearity problem and a solution has been explored.
Here we will sort out the relevant knowledge of multicollinearity as follows.
There is no necessary relationship between the theoretically high correlation of explanatory variables and the highly correlation of observations. It is possible that the two explanatory variables are highly correlated in theory, but the observations are not necessarily highly correlated, and vice versa. So multicollinearity is essentially a data problem.
There are several reasons for multicollinearity:
1. All explanatory variables enjoy a common time trend;
2. One explanatory variable is the lag of the other, and the two often follow a trend;
3. Because the basis for data collection is not wide enough, some explanatory variables may change together;
4. There is a certain approximate linear relationship between certain explanatory variables;
Discrimination:
1. Find out that the symbol of the coefficient estimate is incorrect;
2. Some important explanatory variables have low t value, while the R square is not low
3. When a less important explanatory variable is deleted, the regression results change significantly;
test:
1. Correlation analysis, the correlation coefficient is higher than 0.8, indicating that there is multicollinearity; but the correlation coefficient is low, which cannot indicate that there is no multicollinearity;
2. Vif test;
3. Conditional coefficient test;
Solution:
1. Add data;
2. Implement certain constraints on the model;
3. Delete one or several collinear variables;
4. Deform the model appropriately;
5. Regression of principal components
Principles for dealing with multicollinearity:
1. Multicollinearity is common, and no measures can be taken for minor multicollinearity problems;
2. Serious multicollinearity problems can generally be found based on experience or through analysis of regression results. If the coefficient symbol is influencing, the important explanatory variable t value is very low. The necessary measures should be taken according to different circumstances.
3. If the model is only used for prediction, as long as the fit is good, the multicollinearity problem can not be dealt with. When a multicollinearity model is used for prediction, it often does not affect the prediction result;
The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.