# Some comments about `GTC`

regression functions¶

## Overview¶

`GTC`

has straight-line regression functions in both the `type_a`

and `function`

modules.

Functions in `type_a`

implement a variety of regression algorithms that provide results in the form of uncertain numbers. When used, the input data (the sequences `x`

and `y`

) are treated as pure numbers (if sequences of uncertain numbers are provided, only the values are used in calculations).

Functions defined in `function`

, on the other hand, expect input sequences of uncertain numbers. These functions estimate the slope and intercept of a line by applying the same type of regression, but uncertainties are propagated through the regression equations. The residuals are not used.

The distinction between functions that evaluate the uncertainty of estimates from residuals (`type_a`

) and functions that evaluate uncertainty using uncertain numbers (`function`

) is useful. There will be circumstances that require the use of a function in `function`

, such as when systematic errors contribute to uncertainty but cannot be estimated properly using conventional regression. Without the methods available in `function`

, such components of uncertainty could not be propagated. On the other hand, functions in `type_a`

implement conventional regression methods.

Discretion will be needed if it is believed that variability in a sample of data is due, in part, to errors not fully accounted for in an uncertain-number description of the data. The question is then: just how much of that variability can be explained by components of uncertainty already defined as uncertain number influences? If the answer is ‘very little’ then it will be appropriate to use a function from `type_a`

to estimate the additional contribution to uncertainty from the sample variability. At the same time, components of uncertainty associated with the uncertain-number data should be propagated using a function from `function`

that performs the same type of regression. The two result values will be identical (the estimates of the slope and intercept will be the same) but the uncertainties will differ. `type_a.merge_components`

can then be used to merge the results.

Clearly, this approach could potentially over-estimate the effect of some influences and inflate the combined uncertainty of results. It is a matter of judgement as to whether to merge type-A and type-B results in a particular procedure.

## The `type_a`

module regression functions¶

### Ordinary least-squares¶

`type_a.line_fit`

implements a conventional ordinary least-squares straight-line regression. The residuals are used to estimate the underlying variance of the y data. The resulting uncertain numbers for the slope and intercept have finite degrees of freedom and are generally correlated.

### Weighted least-squares¶

`type_a.line_fit_wls`

implements a so-called weighted least-squares straight-line regression. This assumes that a sequence of uncertainties provided with the input data are known, exactly (i.e., with infinite degrees of freedom). The uncertainty in the slope and intercept is calculated without considering the residuals.

This approach to linear regression is described in two well-known references [1] [2] , but it may not be what many statisticians associate with the term ‘weighted least-squares’.

### Relative weighted least-squares¶

`type_a.line_fit_rwls`

implements a form of weighted least-squares straight-line regression that we refer to here as ‘relative weighted least-squares’. (Statisticians may regard this as conventional weighted least-squares.)

`type_a.line_fit_rwls`

accepts a sequence of scale factors associated with the observations y, which are used as weighting factors. For an observation \(y\), it is assumed that the uncertainty \(u(y) = \sigma s_y\), where \(\sigma\) is an unknown factor common to all the y data and \(s_y\) is the weight factor provided.

The procedure estimates \(\sigma\) from the residuals, so the uncertain numbers returned for the slope and intercept have finite degrees of freedom.

Note, because the scale factors describe the relative weighting of different observations, the ordinary least-squares function `type_a.line_fit`

and `type_a.line_fit_rwls`

would return equivalent results if all y observations are given the same weighting.

### Weighted total least-squares¶

`type_a.line_fit_wtls`

implements a form of least-squares straight-line regression that takes account of errors in both the x and y data [3].

As in the case of `type_a.line_fit_wls`

, the uncertainties provided for the x and y data are assumed exact. When calculating the uncertainty in the slope and intercept, the residuals are ignored and the uncertain numbers returned have infinite degrees of freedom.

## The `function`

module regression functions¶

### Ordinary least-squares¶

`function.line_fit`

implements the conventional ordinary least-squares straight-line regression to obtain estimates of the slope and intercept of a line through the data. The y data is a sequence of uncertain numbers. The uncertainty of the slope and intercept is found by propagating uncertainty from the input data. The residuals are ignored.

### Weighted least-squares¶

`function.line_fit_wls`

implements a weighted least-squares straight-line regression to estimate the slope and intercept of a line through the data. The y data is a sequence of uncertain numbers. An explicit sequence of uncertainties for the data points may also be provided. If so, these uncertainties are used as weights in the algorithm when estimating the slope and intercept. Otherwise, the uncertainty of each uncertain number for y is used. In either case, uncertainty in the estimates of slope and intercept is obtained by propagating the uncertainty associated with the input data through the estimate equations (the residuals are ignored).

Note

`type_a.line_fit_wls`

and `function.line_fit_wls`

will yield the same results when a sequence of elementary uncertain numbers is defined for y and used with `type_a.line_fit_wls`

and the values and uncertainties of that sequence are used with `type_a.line_fit_wls`

.

Note

There is no need for a ‘relative weighted least-squares’ function in the `function`

module. Using a sequence of `u_y`

values with `function.line_fit_wls`

will perform the required calculation.

### Weighted total least-squares¶

`function.line_fit_wtls`

implements a form of least-squares straight-line regression that takes account of errors in both the x and y data. [3].

As with `function.line_fit_wls`

, sequences of uncertainties for the x and y data may be supplied in addition to sequences of the x and y data. When the optional uncertainty sequences are provided, estimates of the slope and intercept use those uncertainties as weights in the regression process. Otherwise, the input data uncertainties are used as weights in the regression process. In either case, uncertainty in the estimates of slope and intercept is calculated by propagating uncertainty from the input data through the regression equations (residuals are ignored).

Footnotes

[1] | Philip Bevington and D. Keith Robinson, Data Reduction and Error Analysis for the Physical Sciences |

[2] | William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery, Numerical Recipes: The Art of Scientific Computing |

[3] | (1, 2) M Krystek and M Anton, Meas. Sci. Technol. 22 (2011) 035101 (9pp) |