# Evaluating type-A uncertainty¶

A type-A evaluation of uncertainty involves statistical analysis of data. In contrast, type-B uncertainty is obtained without statistical analysis.

The prefix type_a (or the alias ta) is needed as to resolve the names of objects defined in this module.

## Correcting indications¶

• BiasedIndication is a class of objects that can be used to correct future indications for bias using a type-A estimate of required the correction term.

## Merging uncertain components¶

Note

Many functions in type_a treat the data as pure numbers. Sequences of uncertain numbers can be passed to these functions, but only the values of the uncertain numbers will be used. This allows type-B uncertainty components to be associated with observational data (e.g., the type-B uncertainty due to a systematic error) before a type-A analysis is performed, which is often convenient.

merge_components is provided so that the results of type-A and type-B analyses on the same data sequence can be combined. Note, however, that doing so may over-emphasize uncertainty components that contribute to variability in the observations.

## Module contents¶

estimate(seq, label=None)

Obtain an uncertain number by type-A evaluation

Parameters: seq – a sequence representing a sample of data label – a label for the returned uncertain number an uncertain real number, or an uncertain complex number

The elements of seq may be real numbers, complex numbers, or uncertain real or complex numbers. Note that if uncertain numbers are used, only the value attribute is used.

The sample mean is an estimate of the quantity of interest. The uncertainty in this estimate is the standard deviation of the sample mean (or the sample covariance of the mean, for the complex case).

Returns an uncertain real number when the mean of seq is real, or an uncertain complex number when the mean is complex.

Examples:

>>> data = range(15)
>>> type_a.estimate(data)
ureal(7,1.15470053837925,14)

>>> data = [(0.91518731126816899+1.5213442955575518j),
... (0.96572684493613492-0.18547192979059401j),
... (0.23216598132006649+1.6951311687588568j),
... (2.1642786101267397+2.2024333895672563j),
... (1.1812532664590505+0.59062101107787357j),
... (1.2259264339405165+1.1499373179910186j),
... (-0.99422341300318684+1.7359338393131392j),
... (1.2122867690240853+0.32535154897909946j),
... (2.0122536479379196-0.23283009302603963j),
... (1.6770229536619197+0.77195994890476838j)]

>>> type_a.estimate(data)
ucomplex(
(1.059187840567141+0.9574410497332931j),
u=[0.2888166531024181,0.2655555630050262],
r=-0.314,
df=9
)

multi_estimate_real(seq_of_seq, labels=None)

Return a sequence of related uncertain real numbers

Parameters: seq_of_seq – a sequence of real-valued sequences labels – a sequence of labels a sequence of uncertain real numbers

The sequences in seq_of_seq must all be the same length.

Defines uncertain numbers using the sample statistics from the data sequences, including the sample covariance.

The uncertain numbers returned are considered related, so that a degrees-of-freedom calculation can be performed even if there is correlation between them.

Example:

# From Appendix H2 in the GUM

>>> V = [5.007,4.994,5.005,4.990,4.999]
>>> I = [19.663E-3,19.639E-3,19.640E-3,19.685E-3,19.678E-3]
>>> phi = [1.0456,1.0438,1.0468,1.0428,1.0433]
>>> v,i,p = type_a.multi_estimate_real((V,I,phi),labels=('V','I','phi'))
>>> v
ureal(4.99899999999999967,0.00320936130717617944,4,label='V')
>>> i
ureal(0.019661000000000001392,9.47100839404133456689e-06,4,label='I')
>>> p
ureal(1.044459999999999944,0.0007520638270785368149,4,label='phi')

>>> r = v/i*cos(p)
>>> r
ureal(127.73216992810208,0.071071407396995398,4)

multi_estimate_complex(seq_of_seq, labels=None)

Return a sequence of related uncertain complex numbers

Parameters: seq_of_seq – a sequence of complex number sequences labels – a sequence of labels for the uncertain numbers a sequence of uncertain complex numbers

The sequences in seq_of_seq must all be the same length.

Defines uncertain numbers using the sample statistics, including the sample covariance.

The uncertain complex numbers returned are considered related, so they may be used in a degrees-of-freedom calculation even if there is correlation between them.

Example:

# From Appendix H2 in the GUM

>>> I = [ complex(x) for x in (19.663E-3,19.639E-3,19.640E-3,19.685E-3,19.678E-3) ]
>>> V = [ complex(x) for x in (5.007,4.994,5.005,4.990,4.999)]
>>> P = [ complex(0,p) for p in (1.0456,1.0438,1.0468,1.0428,1.0433) ]

>>> v,i,p = type_a.multi_estimate_complex( (V,I,P) )

>>> get_correlation(v.real,i.real)
-0.355311219817512

>>> z = v/i*exp(p)
>>> z.real
ureal(127.73216992810208,0.071071407396995398,4)
>>> get_correlation(z.real,z.imag)
-0.5884297844235157

estimate_digitized(seq, delta, label=None, truncate=False)

Return an uncertain number for the mean of a sample of digitized data

Parameters: seq – a sequence of real numbers or uncertain real numbers delta – a real number for the digitization step size label – a label for the returned uncertain number truncate – if True, truncation is assumed

When an instrument rounds, or truncates, readings to a finite resolution delta, the uncertainty in an estimate of the mean of a sequence of readings depends on the amount of scatter in the data and on the number of points in the sample.

The argument truncate should be set True if an instrument truncates readings instead of rounding them.

See reference: R Willink, Metrologia, 44 (2007) 73-81

Examples:

# LSD = 0.0001, data varies between -0.0055 and -0.0057
>>> seq = (-0.0056,-0.0055,-0.0056,-0.0056,-0.0056,
...      -0.0057,-0.0057,-0.0056,-0.0056,-0.0057,-0.0057)
>>> type_a.estimate_digitized(seq,0.0001)
ureal(-0.0056272727272727272874,1.9497827808661157478e-05,10)

# LSD = 0.0001, data varies between -0.0056 and -0.0057
>>> seq = (-0.0056,-0.0056,-0.0056,-0.0056,-0.0056,
... -0.0057,-0.0057,-0.0056,-0.0056,-0.0057,-0.0057)
>>> type_a.estimate_digitized(seq,0.0001)
ureal(-0.0056363636363636364021,1.5212000482437778871e-05,10)

# LSD = 0.0001, no spread in data values
>>> seq = (-0.0056,-0.0056,-0.0056,-0.0056,-0.0056,
... -0.0056,-0.0056,-0.0056,-0.0056,-0.0056,-0.0056)
>>> type_a.estimate_digitized(seq,0.0001)
ureal(-0.0055999999999999999431,2.8867513459481289171e-05,10)

# LSD = 0.0001, no spread in data values, fewer points
>>> seq = (-0.0056,-0.0056,-0.0056)
>>> type_a.estimate_digitized(seq,0.0001)
ureal(-0.0055999999999999999431,3.2914029430219170322e-05,2)

mean(seq)

Return the arithmetic mean of data in seq

If seq contains real or uncertain real numbers, a real number is returned.

If seq contains complex or uncertain complex numbers, a complex number is returned.

Example:

>>> data = range(15)
>>> type_a.mean(data)
7.0

standard_deviation(seq, mu=None)

Return the sample standard deviation

Parameters: seq – sequence of numbers mu – the arithmetic mean of seq

If seq contains complex or uncertain complex numbers, the standard deviation in the real and imaginary components is evaluated, as well as the sample correlation coefficient.

Otherwise the sample standard deviation is returned.

The calculation only uses the value attribute of uncertain numbers.

Examples:

>>> data = range(15)
>>> type_a.standard_deviation(data)
4.47213595499958

>>> data = [(0.91518731126816899+1.5213442955575518j),
... (0.96572684493613492-0.18547192979059401j),
... (0.23216598132006649+1.6951311687588568j),
... (2.1642786101267397+2.2024333895672563j),
... (1.1812532664590505+0.59062101107787357j),
... (1.2259264339405165+1.1499373179910186j),
... (-0.99422341300318684+1.7359338393131392j),
... (1.2122867690240853+0.32535154897909946j),
... (2.0122536479379196-0.23283009302603963j),
... (1.6770229536619197+0.77195994890476838j)]
>>> sd,r = type_a.standard_deviation(data)
>>> sd
standard_deviation(real=0.913318449990377, imag=0.8397604244242309)
>>> r
-0.31374045124595246

standard_uncertainty(seq, mu=None)

Return the standard uncertainty of the sample mean

Parameters: seq – sequence of numbers mu – the arithmetic mean of seq float

If seq contains complex or uncertain complex numbers, the standard uncertainties of the real and imaginary components are evaluated, as well as the sample correlation coefficient.

Otherwise the standard uncertainty of the sample mean is returned.

The calculation only uses the value attribute of uncertain numbers.

Example:

>>> data = range(15)
>>> type_a.standard_uncertainty(data)
1.1547005383792515

>>> data = [(0.91518731126816899+1.5213442955575518j),
... (0.96572684493613492-0.18547192979059401j),
... (0.23216598132006649+1.6951311687588568j),
... (2.1642786101267397+2.2024333895672563j),
... (1.1812532664590505+0.59062101107787357j),
... (1.2259264339405165+1.1499373179910186j),
... (-0.99422341300318684+1.7359338393131392j),
... (1.2122867690240853+0.32535154897909946j),
... (2.0122536479379196-0.23283009302603963j),
... (1.6770229536619197+0.77195994890476838j)]
>>> u,r = type_a.standard_uncertainty(data)
>>> u
standard_uncertainty(real=0.28881665310241805, imag=0.2655555630050262)
>>> u.real
0.28881665310241805
>>> r
-0.31374045124595246

variance_covariance_complex(seq, mu=None)

Return the sample variance-covariance matrix

Parameters: seq – sequence of numbers mu – the arithmetic mean of seq a 4-element sequence

If mu is not provided it will be evaluated (see mean).

seq may contain numbers or uncertain numbers. However, the calculation only uses the value of uncertain numbers.

Example:

>>> data = [(0.91518731126816899+1.5213442955575518j),
... (0.96572684493613492-0.18547192979059401j),
... (0.23216598132006649+1.6951311687588568j),
... (2.1642786101267397+2.2024333895672563j),
... (1.1812532664590505+0.59062101107787357j),
... (1.2259264339405165+1.1499373179910186j),
... (-0.99422341300318684+1.7359338393131392j),
... (1.2122867690240853+0.32535154897909946j),
... (2.0122536479379196-0.23283009302603963j),
... (1.6770229536619197+0.77195994890476838j)]
>>> type_a.variance_covariance_complex(data)
variance_covariance(rr=0.8341505910928249, ri=-0.24062910264062262,
ir=-0.24062910264062262, ii=0.7051975704291644)

>>> v = type_a.variance_covariance_complex(data)
>>> v[0]
0.8341505910928249
>>> v.rr
0.8341505910928249
>>> v.ii
0.7051975704291644

line_fit(x, y, label=None)

Return a least-squares straight-line fit to the data

Parameters: x – sequence of stimulus data (independent-variable) y – sequence of response data (dependent-variable) label – suffix to label the uncertain numbers a and b an object containing regression results LineFitOLS

Performs an ordinary least-squares regression of y to x.

Example:

>>> x = [1,2,3,4,5,6,7,8,9]
>>> y = [15.6,17.5,36.6,43.8,58.2,61.6,64.2,70.4,98.8]
>>> result = type_a.line_fit(x,y)
>>> a,b = result.a_b
>>> a
ureal(4.81388888888888,4.88620631218336,7)
>>> b
ureal(9.408333333333335,0.868301647656361,7)

>>> y_p = a + b*5.5
>>> dof(y_p)
7.0

line_fit_wls(x, y, u_y, label=None)

Return a weighted least-squares straight-line fit

Parameters: x – sequence of stimulus data (independent-variable) y – sequence of response data (dependent-variable) u_y – sequence of uncertainties in the response data label – suffix to label the uncertain numbers a and b an object containing regression results LineFitWLS

Example:

>>> x = [1,2,3,4,5,6]
>>> y = [3.2, 4.3, 7.6, 8.6, 11.7, 12.8]
>>> u_y = [0.5,0.5,0.5,1.0,1.0,1.0]

>>> fit = type_a.line_fit_wls(x,y,u_y)
>>> fit.a_b
intercept_slope(
a=ureal(0.8852320675105488,0.5297081435088364,inf),
b=ureal(2.056962025316456,0.177892016741205,inf)
)

line_fit_rwls(x, y, s_y, label=None)

Return a relative weighted least-squares straight-line fit

The s_y values are used to scale variability in the y data. It is assumed that the standard deviation of each y value is proportional to the corresponding s_y scale factor. The unknown common factor in the uncertainties is estimated from the residuals.

Parameters: x – sequence of stimulus data (independent-variable) y – sequence of response data (dependent-variable) s_y – sequence of scale factors label – suffix to label the uncertain numbers a and b an object containing regression results LineFitRWLS

Example:

>>> x = [1,2,3,4,5,6]
>>> y = [3.014,5.225,7.004,9.061,11.201,12.762]
>>> s_y = [0.2,0.2,0.2,0.4,0.4,0.4]
>>> fit = type_a.line_fit_rwls(x,y,s_y)
>>> a, b = fit.a_b
>>>
>>> print fit

Relative Weighted Least-Squares Results:

Number of points: 6
Intercept: 1.14, u=0.12, df=4
Slope: 1.973, u=0.041, df=4
Correlation: -0.87
Sum of the squared residuals: 1.33952

line_fit_wtls(a0_b0, x, y, u_x, u_y, r_xy=None, label=None)

Return a total least-squares straight-line fit

Parameters: a0_b0 – initial line intercept and slope x – sequence of independent-variable data y – sequence of dependent-variable data u_x – sequence of uncertainties in x u_y – sequence of uncertainties in y r_xy – correlation between x-y pairs label – suffix labeling the uncertain numbers a and b an object containing the fitting results LineFitWTLS

Based on paper by M Krystek and M Anton, Meas. Sci. Technol. 22 (2011) 035101 (9pp)

Example:

# Pearson-York test data see, e.g.,
# Lybanon, M. in Am. J. Phys 52 (1) 1984
>>> x=[0.0,0.9,1.8,2.6,3.3,4.4,5.2,6.1,6.5,7.4]
>>> wx=[1000.0,1000.0,500.0,800.0,200.0,80.0,60.0,20.0,1.8,1.0]

>>> y=[5.9,5.4,4.4,4.6,3.5,3.7,2.8,2.8,2.4,1.5]
>>> wy=[1.0,1.8,4.0,8.0,20.0,20.0,70.0,70.0,100.0,500.0]

# initial estimates are needed
>>> a0_b0 = type_a.line_fit(x,y).a_b

# standard uncertainties required for weighting
>>> ux=[1./math.sqrt(wx_i) for wx_i in wx ]
>>> uy=[1./math.sqrt(wy_i) for wy_i in wy ]

>>> result = type_a.line_fit_wtls(a0_b0,x,y,ux,uy)
>>> result.a_b
intercept_slope(
a=ureal(5.479910183283027,0.2919334989452106,8),
b=ureal(-0.48053339910867754,0.057616740759399841,8)
)

class LineFitOLS(a, b, ssr, N)

This object holds the results of an ordinary least-squares regression to data.

It can also be used to apply the results of a regression analysis.

N

The number of points in the sample

a_b

Return the intercept and slope as uncertain numbers

ssr

Sum of the squared residuals

The sum of the squared deviations between values predicted by the model and the actual data.

If weights are used during the fit, the squares of weighted deviations are summed.

x_from_y(yseq, label=None)

Estimate the stimulus x that caused the response yseq.

Parameters: yseq – a sequence of further observations of y label – a label for the estimate of y based on yseq

Example

>>> x_data = [0.1, 0.1, 0.1, 0.3, 0.3, 0.3, 0.5, 0.5, 0.5,
...                 0.7, 0.7, 0.7, 0.9, 0.9, 0.9]
>>> y_data = [0.028, 0.029, 0.029, 0.084, 0.083, 0.081, 0.135, 0.131,
...                 0.133, 0.180, 0.181, 0.183, 0.215, 0.230, 0.216]

>>> fit = type_a.line_fit(x_data,y_data)

>>> x0 = fit.x_from_y( [0.0712, 0.0716] )
>>> summary(x0)
'0.260, u=0.018, df=13'

y_from_x(x, label=None)

Return an uncertain number y for the response to x

Parameters: x – a real number, or an uncertain real number

Estimates the response y that might be observed for a stimulus x

An uncertain real number can be used for x, in which case the associated uncertainty is also propagated into y.

class LineFitWLS(a, b, ssr, N)

This object holds results from a weighted LS linear regression to data.

N

The number of points in the sample

a_b

Return the intercept and slope as uncertain numbers

ssr

Sum of the squared residuals

The sum of the squared deviations between values predicted by the model and the actual data.

If weights are used during the fit, the squares of weighted deviations are summed.

class LineFitRWLS(a, b, ssr, N)

This object holds the results of a relative weighted least-squares regression. The weight factors provided normalise the variability of observations.

N

The number of points in the sample

a_b

Return the intercept and slope as uncertain numbers

ssr

Sum of the squared residuals

The sum of the squared deviations between values predicted by the model and the actual data.

If weights are used during the fit, the squares of weighted deviations are summed.

x_from_y(yseq, s_y, label=None)

Estimates the stimulus x that generated the response sequence yseq

Parameters: yseq – a sequence of further observations of y s_y – a scale factor for the uncertainty of the yseq label – a label for the estimate of y based on yseq
y_from_x(x, s_y, label=None)

Return an uncertain number y for the response to x

Parameters: x – a real number, or an uncertain real number s_y – a scale factor for the response uncertainty

Estimates the response y that might be generated by a stimulus x.

Because there is different variability in the response to different stimuli, the scale factor s_y is required. It is assumed that the standard deviation in the y value is proportional to s_y.

An uncertain real number can be used for x, in which case the associated uncertainty is also propagated into y.

class LineFitWTLS(a, b, ssr, N)

This object holds results from a TLS linear regression to data.

N

The number of points in the sample

a_b

Return the intercept and slope as uncertain numbers

ssr

Sum of the squared residuals

The sum of the squared deviations between values predicted by the model and the actual data.

If weights are used during the fit, the squares of weighted deviations are summed.

merge_components(a, b)

Combine the uncertainty components of a and b

Parameters: a – an uncertain real or complex number b – an uncertain real or complex number an uncertain number that combines the uncertainty components of a and b

The values of a and b must be equal and the components of uncertainty associated with a and b must be distinct, otherwise a RuntimeError will be raised.

Use this function to combine results from type-A and type-B uncertainty analyses performed on a common sequence of data.

Note

Some judgement will be required as to when it is appropriate to merge uncertainty components.

There is a risk of ‘double-counting’ uncertainty if type-B components are contributing to the variability observed in the data, and therefore assessed in a type-A analysis.

Example:

# From Appendix H3 in the GUM

t = (21.521,22.012,22.512,23.003,23.507,
23.999,24.513,25.002,25.503,26.010,26.511)

# Observed differences with calibration standard (degrees C)
b = (-0.171,-0.169,-0.166,-0.159,-0.164,
-0.165,-0.156,-0.157,-0.159,-0.161,-0.160)

# Arbitrary offset temperature (degrees C)
t_0 = 20.0

# Calculate the temperature relative to t_0
t_rel = [ t_k - t_0 for t_k in t ]

# A common systematic error in all differences
e_sys = ureal(0,0.01)

b_type_b = [ b_k + e_sys for b_k in b ]

# Type-A least-squares regression
y_1_a, y_2_a = type_a.line_fit(t_rel,b_type_b).a_b

# Type-B least-squares regression
y_1_b, y_2_b = function.line_fit(t_rel,b_type_b)

# y_1 and y_2 have uncertainty components
# related to the type-A analysis as well as the
# type-B systematic error
y_1 = type_a.merge_components(y_1_a,y_1_b)
y_2 = type_a.merge_components(y_2_a,y_2_b)

chisq_p(nu, x)

Return the propability that chi-squared could be less than x

Parameters: nu – the number of degrees of freedom x – sum of squared residuals the incomplete gamma function P(nu,x)

P(nu,x) is the probability that any random set of N points would give a value of chi-squared less than the observed value x

See: P R Bevington, Data reduction and error analysis for the physical sciences (McGraw-Hill)

Example:

>>> type_a.chisq_p(10,3.94)
0.049986909209909315

chisq_q(nu, x)

Return the propability that chi-squared could exceed x

Parameters: nu – the number of degrees of freedom x – sum of squared residuals the incomplete gamma function Q(nu,x) = 1 - P(nu,x)

Q(nu,x) = 1 - P(nu,x) is the probability that any random set of N points would give a value of chi-squared greater than or equal to the observed value x

See: P R Bevington, Data reduction and error analysis for the physical sciences (McGraw-Hill)

Example:

>>> type_a.chisq_q(10,3.94)
0.9500130907900907


This function might be used after a weighted least-squares regression has been performed, to assess the goodness of fit. chisq_q can calculate the probability that a value of sum-squared-residuals could be exceeded by chance. (See Numerical Recipes in C: The Art of Scientific Computing, Section 15.1).

class BiasedIndication(correction)

An object to correct single observations using a type-A estimate of bias. The corrected result is an uncertain number with uncertainty components that depend on the variability of observations and the uncertainty of the bias estimate.

In statistical parlance, the uncertain numbers produced by this object can be used to calculate a prediction interval for future observations.

offset(x, label=None)

Return an uncertain number for the offset-corrected indication

Parameters: x – a real number label – a label for the indication uncertainty an uncertain number

The uncertainty of a corrected indication has a component due to the reading variability and a component due to the uncertainty of the estimated additive correction.

In statistical terminology, the uncertain number can be used to calculate a ‘prediction interval’ for a future indication.

Example:

# sample is a sequence of N indications
# First estimate the bias (offset)
x_bar = type_a.estimate(sample)

# Then create an object to process
# other indications
processor = type_a.BiasedIndication(x_bar)

# x_i is another indication and
# x_corr_i is an uncertain number for
# the offset-corrected indication.
x_corr_i = processor.offset(x_i)