Some comments about
GTC regression functions¶
type_a implement a variety of regression algorithms that provide results in the form of uncertain numbers. When used, the input data (the sequences
y) are treated as pure numbers (if sequences of uncertain numbers are provided, only the values are used in calculations).
Functions defined in
function, on the other hand, expect input sequences of uncertain numbers. These functions estimate the slope and intercept of a line by applying the same type of regression, but uncertainties are propagated through the regression equations. The residuals are not used.
The distinction between functions that evaluate the uncertainty of estimates from residuals (
type_a) and functions that evaluate uncertainty using uncertain numbers (
function) is useful. There will be circumstances that require the use of a function in
function, such as when systematic errors contribute to uncertainty but cannot be estimated properly using conventional regression. Without the methods available in
function, such components of uncertainty could not be propagated. On the other hand, functions in
type_a implement conventional regression methods.
Discretion will be needed if it is believed that variability in a sample of data is due, in part, to errors not fully accounted for in an uncertain-number description of the data. The question is then: just how much of that variability can be explained by components of uncertainty already defined as uncertain number influences? If the answer is ‘very little’ then it will be appropriate to use a function from
type_a to estimate the additional contribution to uncertainty from the sample variability. At the same time, components of uncertainty associated with the uncertain-number data should be propagated using a function from
function that performs the same type of regression. The two result values will be identical (the estimates of the slope and intercept will be the same) but the uncertainties will differ.
type_a.merge_components can then be used to merge the results.
Clearly, this approach could potentially over-estimate the effect of some influences and inflate the combined uncertainty of results. It is a matter of judgement as to whether to merge type-A and type-B results in a particular procedure.
type_a.line_fit implements a conventional ordinary least-squares straight-line regression. The residuals are used to estimate the underlying variance of the y data. The resulting uncertain numbers for the slope and intercept have finite degrees of freedom and are generally correlated.
type_a.line_fit_wls implements a so-called weighted least-squares straight-line regression. This assumes that a sequence of uncertainties provided with the input data are known, exactly (i.e., with infinite degrees of freedom). The uncertainty in the slope and intercept is calculated without considering the residuals.
type_a.line_fit_rwls implements a form of weighted least-squares straight-line regression that we refer to here as ‘relative weighted least-squares’. (Statisticians may regard this as conventional weighted least-squares.)
type_a.line_fit_rwls accepts a sequence of scale factors associated with the observations y, which are used as weighting factors. For an observation \(y\), it is assumed that the uncertainty \(u(y) = \sigma s_y\), where \(\sigma\) is an unknown factor common to all the y data and \(s_y\) is the weight factor provided.
The procedure estimates \(\sigma\) from the residuals, so the uncertain numbers returned for the slope and intercept have finite degrees of freedom.
Note, because the scale factors describe the relative weighting of different observations, the ordinary least-squares function
type_a.line_fit_rwls would return equivalent results if all y observations are given the same weighting.
As in the case of
type_a.line_fit_wls, the uncertainties provided for the x and y data are assumed exact. When calculating the uncertainty in the slope and intercept, the residuals are ignored and the uncertain numbers returned have infinite degrees of freedom.
function.line_fit implements the conventional ordinary least-squares straight-line regression to obtain estimates of the slope and intercept of a line through the data. The y data is a sequence of uncertain numbers. The uncertainty of the slope and intercept is found by propagating uncertainty from the input data. The residuals are ignored.
function.line_fit_wls implements a weighted least-squares straight-line regression to estimate the slope and intercept of a line through the data. The y data is a sequence of uncertain numbers. An explicit sequence of uncertainties for the data points may also be provided. If so, these uncertainties are used as weights in the algorithm when estimating the slope and intercept. Otherwise, the uncertainty of each uncertain number for y is used. In either case, uncertainty in the estimates of slope and intercept is obtained by propagating the uncertainty associated with the input data through the estimate equations (the residuals are ignored).
function.line_fit_wls will yield the same results when a sequence of elementary uncertain numbers is defined for y and used with
type_a.line_fit_wls and the values and uncertainties of that sequence are used with
function.line_fit_wls, sequences of uncertainties for the x and y data may be supplied in addition to sequences of the x and y data. When the optional uncertainty sequences are provided, estimates of the slope and intercept use those uncertainties as weights in the regression process. Otherwise, the input data uncertainties are used as weights in the regression process. In either case, uncertainty in the estimates of slope and intercept is calculated by propagating uncertainty from the input data through the regression equations (residuals are ignored).
|||Philip Bevington and D. Keith Robinson, Data Reduction and Error Analysis for the Physical Sciences|
|||William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery, Numerical Recipes: The Art of Scientific Computing|
|||(1, 2) M Krystek and M Anton, Meas. Sci. Technol. 22 (2011) 035101 (9pp)|