Abstract: Large numbers of datasets were simulated via sampling, and regression modeling results were compared with known parameters—an analysis undertaken here for the first time at this scale.
The study demonstrates that the impact of multicollinearity on the quality of parameter estimates is far stronger than commonly assumed, even at low or moderate correlations between predictors.
The standard practice of assessing the significance of regression coefficients using t-statistics is compared with the actual precision of estimates relative to their true values, and the results are critically examined. It is shown that t-statistics for regression parameters can often be misleading.
Two novel approaches for selecting the most effective variables are proposed: one based on the so-called reference matrix and the other on efficiency indicators. A combined use of these methods, together with the analysis of each variable’s contribution to the determination, is recommended.
The practical value of these approaches is confirmed through extensive testing on both
simulated homogeneous and heterogeneous datasets, as well as on a real-world example. The results contribute to a more accurate understanding of regression properties, model quality characteristics, and effective strategies for identifying the most reliable predictors. They provide practitioners with better analytical tools.
The presentation is based on the paper: Igor Mandel and Stan Lipovetsky. Rethinking Linear Regression: Simulation-Based Insights and Novel Criteria for Modeling.
AppliedMath 2025, 5(4),140. https://doi.org/10.3390/appliedmath5040140