I’m having an odd problem with doing polynomial regression in Excel. As many have before, I’m trying to get the correct coefficients that Excel is using when it creates a polynomial trend line on a graph. I’ve read how to do it using LINEST, and I have been able to get answers that match the trend line formula when doing 2nd- and 3rd-order regression…but when I try 4th or 5th the answers differ wildly from what Excel is showing me on the trend line formula.
Here’s the data with the chart and my attempt at the 5th order regression:Click for Excel Workbook
Anyone have any ideas on what might be causing me trouble?
The zero values in the output are a consequence of (multi-)collinearity. From MS function help:
“…The LINEST function checks for collinearity and removes any redundant X columns from the regression model when it identifies them. Removed X columns can be recognized in LINEST output as having 0 coefficients in addition to 0 se values….”
For more accurate estimates apply LINEST with mean-centered x-values and post-multiply by the matrix of binomial coefficients. So in place of:
which agree with the trendline values.
Also see: https://newtonexcelbach.wordpress.com/2011/02/04/fitting-high-order-polynomials/ (post and comments)
You may be overfitting; in fact, I’d say you probably are if you’re using a 5th order polynomial. With polynomials higher than order 3 (even higher than order 1 sometimes!) you are taking a risk that your “trend line” will shoot off into outer space at the end points. When you get past a certain level of complexity you are fitting the model to random variations in the data points rather than the underlying mechanism that generated the data.
One way to get avoid the problem is to use a regularization mechanism like ridge regression. Christopher Bishop’s book Pattern Recognition and Machine Learning has an excellent discussion on the topic and gives a good example of how to regularize a polynomial regression.
BTW, I wouldn’t use Excel for statistical analysis. Here is an article on Excel’s statistical capabilities from Computational Statistics and Data Analysis journal (I first saw a link to this on another stackoverflow post; but I don’t remember where; apologies to the original poster for not giving appropriate credit): http://www.pages.drexel.edu/~bdm25/excel2007.pdf