I’m looking to write a vba code that would be able to take in 2 arrays of x and y points which form a scatter plot, and analyze the plot to determine the point at which it goes from linear to quadratic. I’ll discuss my approach I’ve tried below, but the problem is there is no real way to do this other than “eyeball it” that I’m able to come up with.
My approach to this was to iteratively compare the slope between the first point and second point, then the first point and third point, etc, and see when this value would start to deviate. Problem with that was this is a scatter plot, so the slope value between all of the points deviate quite a bit. I then tried to compare the y-values, and see when a square relation might pop up between y_i and y_i+1. That didn’t really work out, because the y value of a point in a scatter plot doesn’t necessarily speak of its nature. I’ve also thought about iteratively creating fits to the curve, but I can’t see how that would yield any different results from the previous two attempts. Below is my current attempt with the last method described. Avg is the Y-value and P2 is the X-value. tp is the “turning point” which is where the graph goes from linear to quadratic.
'Look for turning point where trend becomes nonlinear' slp(1) = (Avg(2) - Avg(1)) '/ (P2(2) - P2(1)) Dim delta As Variant ReDim delta(1 To UBound(slp)) For i = 2 To UBound(LP) 'Looks for turning point by comparing current slopes' 'slp(i) = WorksheetFunction.Slope(P2(i), Avg(i)) slp(i) = (Avg(i) - Avg(i - 1)) ' / (P2(i) - P2(i - 1)) delta(i) = slp(i) - slp(i - 1) Next i For i = 1 To UBound(delta) If delta(i) >= 4000 Then tpx(1) = P2(i) tpy(1) = Avg(i) 'tp(1) = slp(i) Exit For End If Next i For i = 1 To UBound(LP) 'Collects points before and after turning point' If P2(i) < tpx(1) Then x1(i) = P2(i) y1(i) = Avg(i) ElseIf P2(i) >= tpx(1) Then x2(i) = P2(i) y2(i) = Avg(i) End If Next i
I’ve attached a screencap of the plot to give a better idea of the problem. The blue region is the eyeballed linear region, and the orange is the eyeballed quadratic region.
For a same data one can run and compare quadratic and linear least square regressions.
The respective root mean square error (RMSE) are different. RMSE(linear) is higher than RMSE(quadratic), but more or less different depending on the scatter and on the difference of shapes.
In case of quadratic shape, figure 1, RMSE(linear) is much higher than RMSE(quadratic) due to the bad fitting of a linear curve to a quadratic curve. The ratio 10 is far to 1.
In case of almost linear shape, figure 2, the quadratic regression leads to a low value of the coefficient c. Even being quadratic, the curve is almost linear. So both RMSE(linear) and RMSE(quadratic) remain on the same order of magnitude. The ratio is close to 1.
The ratio between RMSE(linear) and RMSE(quadratic) appears a good indicator of the shape :
Ratio close to 1 : The shape is said “of linear kind”.
Ratio significantly higher than 1 : The shape is said “of quadratic kind”.
One can object that the scatter might make less significant this indicator. That is partly true. The more the scatter is large the less the coefficient c becomes relatively small. But even with an high scatter as on figures 3 and 4 the ratio continue to be significant.
Again the shape on figure 3 is said “of quadratic kind” and the shape on figure 4 is said “of linear kind”.
Nevertheless the ratio (slightly higher than 1) which defines the limit between “of linear kind” and “of quadratic kind” remains subjective (or empirical).