How Do Economists Use Population To Compare Data Sets And Draw Conclusions
6 BASIC STATISTICAL TOOLS
There are lies, damn lies, and statistics......
(Anon.)
half-dozen.1 Introduction
half-dozen.2 Definitions
vi.3 Basic Statistics
six.iv Statistical tests
6.1 Introduction
In the preceding chapters basic elements for the proper execution of analytical work such as personnel, laboratory facilities, equipment, and reagents were discussed. Before embarking upon the actual analytical work, all the same, one more than tool for the quality balls of the piece of work must exist dealt with: the statistical operations necessary to command and verify the belittling procedures (Chapter 7) besides as the resulting data (Affiliate 8).
It was stated earlier that making mistakes in analytical work is unavoidable. This is the reason why a complex system of precautions to prevent errors and traps to discover them has to exist fix. An of import aspect of the quality control is the detection of both random and systematic errors. This can exist done past critically looking at the performance of the analysis as a whole and also of the instruments and operators involved in the job. For the detection itself also as for the quantification of the errors, statistical treatment of data is indispensable.
A multitude of unlike statistical tools is available, some of them uncomplicated, some complicated, and oft very specific for certain purposes. In analytical work, the most important mutual operation is the comparison of data, or sets of data, to quantify accuracy (bias) and precision. Fortunately, with a few simple convenient statistical tools about of the information needed in regular laboratory work can exist obtained: the "t-examination, the "F-exam", and regression analysis. Therefore, examples of these volition be given in the ensuing pages.
Clearly, statistics are a tool, not an aim. Elementary inspection of data, without statistical treatment, by an experienced and dedicated analyst may exist just every bit useful as statistical figures on the desk-bound of the disinterested. The value of statistics lies with organizing and simplifying data, to permit some objective estimate showing that an analysis is under control or that a modify has occurred. As important is that the results of these statistical procedures are recorded and tin be retrieved.
six.two Definitions
6.2.1 Mistake
6.2.2 Accuracy
6.2.3 Precision
6.2.4 Bias
Discussing Quality Command implies the use of several terms and concepts with a specific (and sometimes confusing) meaning. Therefore, some of the nearly important concepts volition be defined showtime.
6.two.1 Mistake
Error is the collective noun for any difference of the result from the "true" value*. Analytical errors can be:
1. Random or unpredictable deviations between replicates, quantified with the "standard difference".2. Systematic or predictable regular deviation from the "true" value, quantified as "mean difference" (i.due east. the difference betwixt the true value and the mean of replicate determinations).
3. Abiding, unrelated to the concentration of the substance analyzed (the analyte).
4. Proportional, i.eastward. related to the concentration of the analyte.
* The "truthful" value of an attribute is by nature indeterminate and oft has just a very relative meaning. Peculiarly in soil science for several attributes at that place is no such thing as the true value as any value obtained is method-dependent (eastward.chiliad. cation exchange capacity). Obviously, this does not hateful that no adequate assay serving a purpose is possible. It does, however, emphasize the need for the establishment of standard reference methods and the importance of external QC (run into Chapter 9).
6.2.2 Accurateness
The "trueness" or the closeness of the belittling result to the "truthful" value. It is constituted past a combination of random and systematic errors (precision and bias) and cannot be quantified directly. The test result may be a mean of several values. An authentic decision produces a "true" quantitative value, i.due east. information technology is precise and free of bias.
6.2.three Precision
The closeness with which results of replicate analyses of a sample concord. It is a measure of dispersion or scattering around the mean value and ordinarily expressed in terms of standard deviation, standard mistake or a range (divergence between the highest and the lowest result).
6.two.4 Bias
The consistent deviation of analytical results from the "truthful" value acquired by systematic errors in a process. Bias is the reverse but most used measure out for "trueness" which is the agreement of the mean of analytical results with the truthful value, i.e. excluding the contribution of randomness represented in precision. There are several components contributing to bias:
i. Method bias
The deviation between the (mean) test result obtained from a number of laboratories using the same method and an accepted reference value. The method bias may depend on the analyte level.
2. Laboratory bias
The departure between the (mean) test result from a detail laboratory and the accepted reference value.
3. Sample bias
The difference betwixt the mean of replicate test results of a sample and the ("true") value of the target population from which the sample was taken. In exercise, for a laboratory this refers mainly to sample preparation, subsampling and weighing techniques. Whether a sample is representative for the population in the field is an extremely important aspect but usually falls exterior the responsibility of the laboratory (in some cases laboratories accept their own field sampling personnel).
The relationship betwixt these concepts can be expressed in the following equation:
Figure
The types of errors are illustrated in Fig. half dozen-i.
Fig. 6-1. Accurateness and precision in laboratory measurements. (Note that the qualifications utilise to the mean of results: in c the mean is authentic but some private results are inaccurate)
6.3 Bones Statistics
6.iii.1 Mean
six.3.2 Standard divergence
half dozen.three.3 Relative standard deviation. Coefficient of variation
half dozen.three.4 Conviction limits of a measurement
6.3.5 Propagation of errors
In the discussions of Capacity 7 and 8 bones statistical treatment of information will exist considered. Therefore, some understanding of these statistics is essential and they will briefly be discussed here.
The bones supposition to be made is that a set of data, obtained past repeated analysis of the aforementioned analyte in the same sample under the same atmospheric condition, has a normal or Gaussian distribution. (When the distribution is skewed statistical treatment is more complicated). The primary parameters used are the mean (or average) and the standard deviation (encounter Fig. 6-2) and the primary tools the F-test, the t-test, and regression and correlation analysis.
Fig. half dozen-2. A Gaussian or normal distribution. The figure shows that (approx.) 68% of the information autumn in the range ¯ x± south, 95% in the range ¯10 ± iisouthward, and 99.7% in the range ¯10 ± iiidue south.
6.3.1 Mean
The boilerplate of a gear up of n data x i :
¯ | (6.1) |
vi.three.2 Standard deviation
This is the near commonly used measure out of the spread or dispersion of information around the mean. The standard deviation is defined equally the square root of the variance (V). The variance is defined every bit the sum of the squared deviations from the mean, divided by n-one. Operationally, in that location are several means of calculation:
| (6.1) |
or
| (6.3) |
or
| (6.4) |
The adding of the mean and the standard deviation can easily be done on a reckoner simply virtually conveniently on a PC with figurer programs such as dBASE, Lotus 123, Quattro-Pro, Excel, and others, which take elementary prepare-to-use functions. (Alert: some programs use northward rather than n- 1!).
six.3.3 Relative standard deviation. Coefficient of variation
Although the standard divergence of belittling data may non vary much over limited ranges of such data, it usually depends on the magnitude of such data: the larger the figures, the larger s. Therefore, for comparison of variations (due east.g. precision) it is ofttimes more than convenient to utilise the relative standard departure (RSD) than the standard deviation itself. The RSD is expressed as a fraction, but more ordinarily as a pct and is and then called coefficient of variation (CV). Often, however, these terms are confused.
|
| (6.5; 6.half-dozen) |
Note. When needed (east.g. for the F-test, see Eq. 6.11) the variance tin, of course, be calculated past squaring the standard divergence:
6.three.iv Confidence limits of a measurement
The more an analysis or measurement is replicated, the closer the mean 10 of the results volition arroyo the "truthful" value thousand, of the analyte content (assuming absenteeism of bias).
A single analysis of a examination sample tin be regarded every bit literally sampling the imaginary set of a multitude of results obtained for that examination sample. The uncertainty of such subsampling is expressed by
| (vi.8) |
where
m = "truthful" value (mean of large set of replicates)
¯ten = hateful of subsamples
t = a statistical value which depends on the number of data and the required confidence (usually 95%).
southward = standard deviation of mean of subsamples
n = number of subsamples
(The term is also known equally the standard fault of the mean.)
The critical values for t are tabulated in Appendix 1 (they are, therefore, here referred to as t tab ). To observe the applicable value, the number of degrees of freedom has to be established past: df = n -i (see too Section half dozen.4.2).
Example
For the determination of the clay content in the particle-size assay, a semi-automatic pipette installation is used with a 20 mL pipette. This volume is approximate and the operation involves the opening and closing of taps. Therefore, the pipette has to be calibrated, i.e. both the accuracy (trueness) and precision have to be established.
A tenfold measurement of the volume yielded the following gear up of data (in mL):
19.941 | 19.812 | 19.829 | 19.828 | 19.742 |
xix.797 | 19.937 | 19.847 | nineteen.885 | 19.804 |
The mean is 19.842 mL and the standard divergence 0.0627 mL. According to Appendix 1 for n = 10 is ttab = ii.26 (df = 9) and using Eq. (half dozen.8) this calibration yields:
pipette volume = 19.842 ± 2.26 (0.0627/ ) = nineteen.84 ± 0.04 mL
(Note that the pipette has a systematic difference from 20 mL as this is outside the found confidence interval. Meet also bias).
In routine analytical work, results are normally single values obtained in batches of several test samples. No laboratory will clarify a exam sample 50 times to be confident that the upshot is reliable. Therefore, the statistical parameters have to be obtained in another mode. Most usually this is done by method validation (see Chapter 7) and/or by keeping control charts, which is basically the collection of analytical results from one or more control samples in each batch (run into Chapter 8). Equation (6.eight) is then reduced to
| (six.ix) |
where
thou = "true" value
x = unmarried measurement
t = applicative ttab (Appendix 1)
s = standard deviation of fix of previous measurements.
In Appendix 1 can be seen that if the set of replicated measurements is large (say > 30), t is shut to two. Therefore, the (95%) confidence of the result x of a single test sample (n = one in Eq. half dozen.viii) is approximated by the usually used and well known expression
| (half dozen.10) |
where S is the previously determined standard deviation of the large set of replicates (see too Fig. 6-ii).
Notation: This "method-s" or s of a control sample is not a constant and may vary for different exam materials, analyte levels, and with analytical conditions.
Running duplicates volition, according to Equation (6.8), increase the conviction of the (mean) effect by a factor :
where
¯x = hateful of duplicates
s = known standard deviation of big ready
Similarly, triplicate analysis will increase the confidence past a cistron , etc. Duplicates are further discussed in Section 8.three.3.
Thus, in summary, Equation (6.8) can be applied in diverse ways to determine the size of errors (confidence) in analytical work or measurements: unmarried determinations in routine piece of work, determinations for which no previous information exist, certain calibrations, etc.
6.3.five Propagation of errors
6.three.v.1. Propagation of random errors
six.iii.v.2 Propagation of systematic errors
The final result of an assay is often calculated from several measurements performed during the procedure (weighing, scale, dilution, titration, instrument readings, moisture correction, etc.). As was indicated in Department 6.2, the total mistake in an analytical result is an calculation-upwardly of the sub-errors fabricated in the various steps. For daily practice, the bias and precision of the whole method are usually the most relevant parameters (obtained from validation, Affiliate seven; or from control charts, Chapter 8). However, sometimes it is useful to go an insight in the contributions of the subprocedures (and so these have to exist determined separately). For example if 1 wants to change (part of) the method.
Because the "adding-up" of errors is usually not a uncomplicated summation, this will be discussed. The master stardom to be made is between random errors (precision) and systematic errors (bias).
6.3.5.i. Propagation of random errors
In estimating the total random error from factors in a terminal adding, the treatment of summation or subtraction of factors is different from that of multiplication or division.
I. Summation calculations
If the final result x is obtained from the sum (or difference) of (sub)measurements a, b, c, etc.:
10 = a + b + c +...
and then the total precision is expressed past the standard deviation obtained by taking the foursquare root of the sum of private variances (squares of standard deviation):
If a (sub)measurement has a abiding multiplication factor or coefficient (such as an actress dilution), then this is included to calculate the effect of the variance concerned, e.g. (2b)ii
Instance
The Effective Cation Exchange Capacity of soils (ECEC) is obtained by summation of the exchangeable cations:
ECEC = Exch. (Ca + Mg + Na + K + H + Al)
Standard deviations experimentally obtained for exchangeable Ca, Mg, Na, G and (H + Al) on a sure sample, east.grand. a command sample, are: 0.30, 0.25, 0.15, 0.xv, and 0.60 cmolc/kg respectively. The total precision is:
It can be seen that the total standard difference is larger than the highest individual standard deviation, but (much) less than their sum. It is besides clear that if one wants to reduce the total standard deviation, qualitatively the all-time outcome tin be expected from reducing the largest individual contribution, in this instance the exchangeable acidity.
two. Multiplication calculations
If the final event x is obtained from multiplication (or subtraction) of (sub)measurements according to
then the total error is expressed past the standard difference obtained by taking the foursquare root of the sum of the private relative standard deviations (RSD or CV, as a fraction or as percentage, see Eqs. six.half-dozen and 6.vii):
If a (sub)measurement has a constant multiplication gene or coefficient, then this is included to calculate the effect of the RSD concerned, e.g. (2RSD b)2.
Example
The adding of Kjeldahl-nitrogen may be as follows:
where
a = ml HCl required for titration sample
b = ml HCl required for titration blank
s = air-dry sample weight in gram
M = molarity of HCl
ane.4 = fourteen×x-iii×100% (14 = atomic weight of N)
mcf = moisture correction factor
Annotation that in addition to multiplications, this adding contains a subtraction also (often, calculations contain both summations and multiplications.)
Firstly, the standard deviation of the titration (a -b) is determined equally indicated in Section 7 in a higher place. This is then transformed to RSD using Equations (6.5) or (6.half dozen). Then the RSD of the other individual parameters take to be determined experimentally. The found RSDs are, for instance:
distillation: 0.8%,
titration: 0.5%,
molarity: 0.ii%,
sample weight: 0.2%,
mcf: 0.2%.
The total calculated precision is:
Here once again, the highest RSD (of distillation) dominates the total precision. In practice, the precision of the Kjeldahl method is usually considerably worse (» 2.5%) probably mainly equally a upshot of the heterogeneity of the sample. The present case does not take that into business relationship. Information technology would imply that two.5% - 1.0% = 1.5% or three/5 of the total random error is due to sample heterogeneity (or other disregarded cause). This implies that painstaking efforts to ameliorate subprocedures such as the titration or the training of standard solutions may not be very rewarding. It would, however, pay to better the homogeneity of the sample, e.g. by conscientious grinding and mixing in the preparatory stage.
Note. Sample heterogeneity is also represented in the wet correction factor. However, the influence of this cistron on the final consequence is ordinarily very pocket-sized.
6.3.5.2 Propagation of systematic errors
Systematic errors of (sub)measurements contribute direct to the total bias of the result since the individual parameters in the adding of the final result each carry their ain bias. For case, the systematic error in a residue will crusade a systematic mistake in the sample weight (equally well every bit in the moisture decision). Note that some systematic errors may cancel out, e.g. weighings by departure may not exist affected by a biased balance.
The only way to detect or avert systematic errors is by comparing (calibration) with independent standards and exterior reference or control samples.
half-dozen.4 Statistical tests
6.4.1 2-sided vs. one-sided test
half-dozen.iv.2 F-test for precision
half-dozen.4.3 t-Tests for bias
half-dozen.iv.4 Linear correlation and regression
6.4.5 Analysis of variance (ANOVA)
In analytical work a frequently recurring operation is the verification of performance by comparison of data. Some examples of comparisons in practice are:
- operation of two instruments,- performance of two methods,
- performance of a procedure in different periods,
- performance of two analysts or laboratories,
- results obtained for a reference or command sample with the "true", "target" or "assigned" value of this sample.
Some of the most common and convenient statistical tools to quantify such comparisons are the F-examination, the t-tests, and regression analysis.
Because the F-examination and the t-tests are the near basic tests they will be discussed first. These tests examine if ii sets of normally distributed data are similar or dissimilar (belong or non belong to the same "population") by comparing their standard deviations and means respectively. This is illustrated in Fig. 6-3.
Fig. 6-3. Three possible cases when comparing two sets of data (n one = n two ). A. Different mean (bias), same precision; B. Same mean (no bias), different precision; C. Both mean and precision are different. (The fourth case, identical sets, has not been drawn).
6.4.one Two-sided vs. one-sided test
These tests for comparison, for instance betwixt methods A and B, are based on the assumption that in that location is no pregnant difference (the "null hypothesis"). In other words, when the difference is and then pocket-sized that a tabulated critical value of F or t is not exceeded, nosotros can be confident (normally at 95% level) that A and B are not unlike. Two fundamentally different questions tin exist asked concerning both the comparison of the standard deviations s 1 and s ii with the F-test, and of the means¯tenone, and ¯x2, with the t-test:
1. are A and B unlike? (ii-sided exam)
2. is A college (or lower) than B? (one-sided test).
This stardom has an important practical implication as statistically the probabilities for the ii situations are unlike: the chance that A and B are merely different ("it can go ii ways") is twice as large as the chance that A is higher (or lower) than B ("it tin can become only i mode"). The near common case is the two-sided (too called two-tailed) test: there are no particular reasons to expect that the ways or the standard deviations of two data sets are different. An case is the routine comparison of a control chart with the previous one (see 8.3). However, when it is expected or suspected that the hateful and/or the standard deviation will go just i way, e.g. afterward a change in an analytical process, the one-sided (or ane-tailed) test is appropriate. In this case the probability that it goes the other way than expected is assumed to be nothing and, therefore, the probability that it goes the expected way is doubled. Or, more correctly, the incertitude in the two-way test of 5% (or the probability of five% that the disquisitional value is exceeded) is divided over the two tails of the Gaussian curve (see Fig. 6-ii), i.e. 2.5% at the end of each tail across 2s. If nosotros perform the ane-sided test with 5% uncertainty, we really increase this 2.5% to 5% at the end of 1 tail. (Note that for the whole gaussian curve, which is symmetrical, this is so equivalent to an doubtfulness of 10% in two ways!)
This difference in probability in the tests is expressed in the employ of 2 tables of critical values for both F and t. In fact, the one-sided tabular array at 95% confidence level is equivalent to the two-sided tabular array at 90% confidence level.
Information technology is emphasized that the one-sided test is but appropriate when a difference in one direction is expected or aimed at. Of course it is tempting to perform this test after the results show a clear (unexpected) upshot. In fact, however, then a two times higher probability level was used in retrospect. This is underscored by the ascertainment that in this fashion even contradictory conclusions may arise: if in an experiment calculated values of F and t are found within the range betwixt the two-sided and one-sided values of F tab , and t tab , the two-sided examination indicates no meaning departure, whereas the one-sided test says that the result of A is significantly higher (or lower) than that of B. What actually happens is that in the outset case the 2.5% boundary in the tail was just not exceeded, and and then, subsequently, this ii.5% boundary is relaxed to v% which is then plain more hands exceeded. This illustrates that statistical tests differ in strictness and that for proper interpretation of results in reports, the statistical techniques used, including the confidence limits or probability, should always be specified.
6.four.ii F-test for precision
Considering the upshot of the F-test may be needed to choose betwixt the Student's t-exam and the Cochran variant (meet next department), the F-test is discussed first.
The F-test (or Fisher's test) is a comparison of the spread of ii sets of information to test if the sets belong to the same population, in other words if the precisions are similar or dissimilar.
The test makes use of the ratio of the 2 variances:
| (vi.11) |
where the larger s 2 must be the numerator by convention. If the performances are not very different, then the estimates due south 1 , and due southtwo, do not differ much and their ratio (and that of their squares) should not deviate much from unity. In practice, the calculated F is compared with the applicable F value in the F-table (besides called the disquisitional value, see Appendix ii). To read the table information technology is necessary to know the applicable number of degrees of freedom for south one , and s ii . These are calculated by:
df1 = due north1-1
df2 = nii-1
If F cal £ F tab i can conclude with 95% confidence that there is no pregnant deviation in precision (the "null hypothesis" that s1, = south, is accepted). Thus, at that place is however a v% chance that we draw the wrong conclusion. In sure cases more confidence may exist needed, then a 99% confidence tabular array can exist used, which can be found in statistical textbooks.
Case I (2-sided test)
Tabular array 6-1 gives the data sets obtained by ii analysts for the cation exchange capacity (CEC) of a control sample. Using Equation (6.11) the calculated F value is 1.62. As nosotros had no detail reason to wait that the analysts would perform differently, we use the F-tabular array for the two-sided test and find F tab = 4.03 (Appendix 2, df 1 , = df 2 = nine). This exceeds the calculated value and the nil hypothesis (no divergence) is accustomed. It tin exist concluded with 95% confidence that there is no significant difference in precision between the work of Analyst 1 and 2.
Table 6-1. CEC values (in cmolc/kg) of a control sample determined past two analysts.
1 | 2 |
10.ii | 9.vii |
ten.7 | ix.0 |
ten.five | 10.2 |
9.9 | 10.three |
nine.0 | ten.8 |
eleven.2 | 11.1 |
xi.v | ix.four |
x.nine | 9.ii |
8.9 | 9.8 |
x.6 | ten.2 |
¯x: | ten.34 | 9.97 |
south: | 0.819 | 0.644 |
n: | ten | 10 |
Fcal = ane.62 | tcal = 1.12 | |
Ftab = 4.03 | ttab = 2.10 |
Case 2 (i-sided exam)
The determination of the calcium carbonate content with the Scheibler standard method is compared with the simple and more rapid "acid-neutralization" method using one and the aforementioned sample. The results are given in Table 6-2. Considering of the nature of the rapid method nosotros suspect it to produce a lower precision then obtained with the Scheibler method and nosotros can, therefore, perform the one sided F-test. The applicable F tab = 3.07 (App. 2, df 1 , = 12, df 2 = ix) which is lower than F cal (=18.3) and the null hypothesis (no difference) is rejected. It can be ended (with 95% confidence) that for this one sample the precision of the rapid titration method is significantly worse than that of the Scheibler method.
Tabular array 6-two. Contents of CaCO3 (in mass/mass %) in a soil sample adamant with the Scheibler method (A) and the rapid titration method (B).
A | B |
2.5 | one.7 |
two.4 | 1.ix |
ii.5 | 2.iii |
2.six | 2.iii |
two.5 | 2.8 |
2.5 | 2.5 |
2.iv | ane.6 |
2.6 | 1.9 |
2.vii | 2.6 |
2.4 | i.7 |
- | ii.4 |
- | 2.2 |
2.6 |
ten: | 2.51 | 2.xiii |
s: | 0.099 | 0.424 |
north: | 10 | 13 |
F cal = xviii.3 | tcal = three.12 | |
Ftab = three.07 | ttab* = ii.18 |
(t tab * = Cochran's "alternative" t tab )
six.four.three t-Tests for bias
half-dozen.4.three.1. Student'due south t-examination
6.4.three.2 Cochran's t-test
six.4.3.3 t-Examination for large data sets (n³ 30)
6.4.three.4 Paired t-test
Depending on the nature of two sets of information (n, s, sampling nature), the means of the sets tin be compared for bias by several variants of the t-examination. The following nearly mutual types will be discussed:
1. Student's t-test for comparing of two independent sets of information with very similar standard deviations;2. the Cochran variant of the t-exam when the standard deviations of the independent sets differ significantly;
3. the paired t-test for comparison of strongly dependent sets of data.
Basically, for the t-tests Equation (6.8) is used but written in a different style:
| (6.12) |
where
¯x = mean of exam results of a sample
1000 = "true" or reference value
south = standard departure of test results
north = number of test results of the sample.
To compare the mean of a information ready with a reference value unremarkably the "two-sided t-table of critical values" is used (Appendix 1). The applicable number of degrees of freedom hither is:
df = n-1
If a value for t calculated with Equation (6.12) does not exceed the disquisitional value in the table, the data are taken to belong to the same population: there is no deviation and the "null hypothesis" is accustomed (with the applicative probability, usually 95%).
As with the F-test, when it is expected or suspected that the obtained results are higher or lower than that of the reference value, the ane-sided t-test can exist performed: if t cal > t tab , then the results are significantly college (or lower) than the reference value.
More normally, still, the "true" value of proper reference samples is accompanied by the associated standard deviation and number of replicates used to determine these parameters. We can and so apply the more than general instance of comparing the ways of two data sets: the "true" value in Equation (six.12) is then replaced by the mean of a second data fix. As is shown in Fig. vi-iii, to exam if two data sets belong to the same population information technology is tested if the ii Gauss curves do sufficiently overlap. In other words, if the difference betwixt the means ¯x1-¯x2 is small. This is discussed next.
Similarity or non-similarity of standard deviations
When using the t-test for ii small sets of data (n one and/or north ii <30), a choice of the type of test must exist made depending on the similarity (or non-similarity) of the standard deviations of the two sets. If the standard deviations are sufficiently similar they can be "pooled" and the Student t-test can be used. When the standard deviations are non sufficiently similar an alternative procedure for the t-test must exist followed in which the standard deviations are non pooled. A convenient culling is the Cochran variant of the t-test. The benchmark for the option is the passing or non-passing of the F-exam (see 6.4.2), that is, if the variances do or do not significantly differ. Therefore, for small data sets, the F-exam should precede the t-test.
For dealing with large information sets (north 1 , n 2 , ³ xxx) the "normal" t-test is used (come across Section half-dozen.4.3.3 and App. 3).
6.four.3.1. Student'due south t-examination
(To be applied to small-scale information sets (n one , n ii < xxx) where due south 1 , and s 2 are similar according to F-examination.
When comparison two sets of data, Equation (6.12) is rewritten as:
| (6.thirteen) |
where
¯xi = mean of information gear up one
¯x2 = hateful of data set 2
s p = "pooled" standard difference of the sets
n 1 = number of data in gear up 1
n 2 = number of data in set ii.
The pooled standard deviation s p is calculated by:
| 6.14 |
where
southward 1 = standard divergence of information set one
sii = standard deviation of data set up two
due north 1 = number of data in set 1
n two = number of data in fix two.
To perform the t-exam, the critical t tab has to exist establish in the tabular array (Appendix 1); the applicable number of degrees of freedom df is here calculated by:
df = north 1 + north 2 -2
Example
The two data sets of Tabular array 6-1 tin can be used: With Equations (half dozen.13) and (6.14) t cal , is calculated as ane.12 which is lower than the critical value t tab of 2.10 (App. 1, df = xviii, two-sided), hence the null hypothesis (no deviation) is accepted and the two data sets are assumed to belong to the aforementioned population: at that place is no significant difference betwixt the mean results of the two analysts (with 95% confidence).
Notation. Another illustrative mode to perform this exam for bias is to summate if the difference between the means falls inside or outside the range where this difference is yet not significantly large. In other words, if this deviation is less than the to the lowest degree significant divergence (lsd). This can exist derived from Equation (6.13):
| 6.fifteen |
In the present instance of Table 6-1, the adding yields lsd = 0.69. The measured difference betwixt the means is 10.34 -9.97 = 0.37 which is smaller than the lsd indicating that in that location is no significant departure between the performance of the analysts.
In improver, in this approach the 95% conviction limits of the difference between the means tin exist calculated (cf. Equation 6.eight):
conviction limits = 0.37 ± 0.69 = -0.32 and 1.06
Notation that the value 0 for the difference is situated within this confidence interval which agrees with the null hypothesis of x 1 = x 2 (no difference) having been accepted.
6.four.3.2 Cochran'due south t-test
To be practical to pocket-sized information sets (northward 1 , n 2 , < 30) where s 1 and s 2 , are different co-ordinate to F-examination.
Calculate t with:
| 6.16 |
Then determine an "alternative" critical t-value:
| 6.17 |
where
t1 = t tab at none-1 degrees of liberty
t2 = t tab at northwardii-1 degrees of freedom
Now the t-test can be performed as usual: if t cal < t tab * then the null hypothesis that the means practise not significantly differ is accustomed.
Example
The two data sets of Tabular array vi-two can exist used.
According to the F-examination, the standard deviations differ significantly so that the Cochran variant must exist used. Furthermore, in contrast to our expectation that the precision of the rapid test would be inferior, we have no idea about the bias and therefore the 2-sided test is appropriate. The calculations yield tcal = three.12 and ttab *= 2.18 meaning that tcal exceeds ttab * which implies that the goose egg hypothesis (no deviation) is rejected and that the hateful of the rapid assay deviates significantly from that of the standard analysis (with 95% confidence, and for this sample simply). Further investigation of the rapid method would take to include the use of more different samples then comparison with the one-sided t-test would exist justified (see 6.4.3.4, Example ane).
6.four.iii.3 t-Test for big data sets (due north³ 30)
In the example in a higher place (half-dozen.4.iii.2) the determination happens to have been the same if the Student's t-examination with pooled standard deviations had been used. This is caused by the fact that the deviation in result of the Educatee and Cochran variants of the t-test is largest when small sets of data are compared, and decreases with increasing number of data. Namely, with increasing number of data a better estimate of the real distribution of the population is obtained (the flatter t-distribution converges then to the standardized normal distribution). When north³ thirty for both sets, e.thou. when comparison Control Charts (meet eight.3), for all practical purposes the divergence between the Student and Cochran variant is negligible. The procedure is then reduced to the "normal" t-test by simply calculating tcal with Eq. (6.16) and comparing this with ttab at df = due north one + n ii -2. (Note in App. one that the two-sided ttab is now close to 2).
The proper choice of the t-exam as discussed above is summarized in a flow diagram in Appendix three.
6.4.3.4 Paired t-test
When two data sets are not independent, the paired t-test can exist a better tool for comparison than the "normal" t-test described in the previous sections. This is for instance the case when two methods are compared by the same analyst using the same sample(s). It could, in fact, likewise be practical to the example of Table vi-1 if the two analysts used the same analytical method at (about) the same time.
Equally stated previously, comparison of two methods using unlike levels of analyte gives more than validation information most the methods than using simply one level. Comparison of results at each level could be washed by the F and t-tests as described above. The paired t-test, however, allows for different levels provided the concentration range is not too wide. Equally a rule of fist, the range of results should exist within the same magnitude. If the analysis covers a longer range, i.east. several powers of ten, regression analysis must be considered (see Section half-dozen.4.4). In intermediate cases, either technique may be chosen.
The zip hypothesis is that there is no difference betwixt the data sets, and then the test is to see if the mean of the differences between the information deviates significantly from zero or non (2-sided test). If it is expected that 1 fix is systematically higher (or lower) than the other prepare, then the one-sided examination is appropriate.
Example 1
The "promising" rapid unmarried-extraction method for the determination of the cation commutation capacity of soils using the silvery thiourea complex (AgTU, buffered at pH 7) was compared with the traditional ammonium acetate method (NH4OAc, pH vii). Although for certain soil types the difference in results appeared insignificant, for other types differences seemed larger. Such a suspect group were soils with ferralic (oxic) properties (i.due east. highly weathered sesquioxide-rich soils). In Table half dozen-3 the results often soils with these properties are grouped to test if the CEC methods give different results. The difference d within each pair and the parameters needed for the paired t-exam are given also.
Tabular array half dozen-3. CEC values (in cmolc/kg) obtained past the NH4OAc and AgTU methods (both at pH 7) for x soils with ferralic backdrop.
Sample | NH 4 OAc | AgTU | d |
ane | 7.1 | vi.5 | -0.vi |
ii | 4.6 | 5.half-dozen | +1.0 |
3 | 10.half-dozen | 14.5 | +3.9 |
4 | 2.3 | five.half-dozen | +3.iii |
v | 25.ii | 23.8 | -1.4 |
half-dozen | 4.4 | 10.4 | +six.0 |
7 | 7.8 | 8.4 | +0.6 |
8 | ii.7 | five.5 | +2.8 |
ix | 14.3 | nineteen.2 | +iv.9 |
ten | 13.half dozen | 15.0 | +1.four |
¯d = +2.19 | tcal = 2.89 |
due south d = 2.395 | ttab = 2.26 |
Using Equation (six.12) and noting that m d = 0 (hypothesis value of the differences, i.e. no difference), the t-value tin can exist calculated as:
where
= mean of differences inside each pair of information
s d = standard deviation of the mean of differences
n = number of pairs of data
The calculated t value (=2.89) exceeds the critical value of one.83 (App. i, df = n -1 = 9, one-sided), hence the goose egg hypothesis that the methods do not differ is rejected and it is ended that the silvery thiourea method gives significantly higher results as compared with the ammonium acetate method when practical to such highly weathered soils.
Notation. Since such information sets do not have a normal distribution, the "normal" t-examination which compares ways of sets cannot exist used here (the ways practice non found a fair representation of the sets). For the same reason no information near the precision of the 2 methods can be obtained, nor can the F-test be applied. For information about precision, replicate determinations are needed.
Instance ii
Table 6-4 shows the data of full-P in iv plant tissue samples obtained by a laboratory L and the median values obtained by 123 laboratories in a proficiency (circular-robin) test.
Table half-dozen-iv. Full-P contents (in mmol/kg) of plant tissue as determined by 123 laboratories (Median) and Laboratory 50.
Sample | Median | Lab L | d |
1 | 93.0 | 85.two | -7.8 |
2 | 201 | 224 | 23 |
3 | 78.9 | 84.5 | 5.6 |
4 | 175 | 185 | 10 |
¯d = 7.seventy | t cal =ane.21 |
s d = 12.702 | ttab = 3.18 |
To verify the operation of the laboratory a paired t-test can be performed:
Using Eq. (6.12) and noting that chiliad d=0 (hypothesis value of the differences, i.e. no difference), the t value can be calculated equally:
The calculated t-value is below the critical value of 3.xviii (Appendix 1, df = n - ane = 3, two-sided), hence the null hypothesis that the laboratory does not significantly differ from the grouping of laboratories is accepted, and the results of Laboratory L seem to concord with those of "the rest of the earth" (this is a then-called 3rd-line control).
6.4.4 Linear correlation and regression
6.4.4.1 Construction of calibration graph
6.iv.4.two Comparing two sets of data using many samples at unlike analyte levels
These also vest to the nearly common useful statistical tools to compare effects and performances X and Y. Although the technique is in principle the aforementioned for both, there is a fundamental departure in concept: correlation analysis is practical to independent factors: if X increases, what will Y do (increment, subtract, or mayhap not change at all)? In regression analysis a unilateral response is assumed: changes in 10 result in changes in Y, but changes in Y do not upshot in changes in X.
For example, in belittling work, correlation analysis can be used for comparing methods or laboratories, whereas regression analysis can be used to construct calibration graphs. In practice, all the same, comparison of laboratories or methods is usually likewise done by regression analysis. The calculations can be performed on a (programmed) figurer or more conveniently on a PC using a home-made programme. Even more convenient are the regression programs included in statistical packages such as Statistix, Mathcad, Eureka, Genstat, Statcal, SPSS, and others. Also, most spreadsheet programs such as Lotus 123, Excel, and Quattro-Pro accept functions for this.
Laboratories or methods are in fact independent factors. Still, for regression analysis one factor has to exist the independent or "constant" factor (e.thou. the reference method, or the cistron with the smallest standard difference). This cistron is by convention designated X, whereas the other factor is then the dependent factor Y (thus, nosotros speak of "regression of Y on X").
As was discussed in Department 6.4.3, such comparisons can oft been washed with the Student/Cochran or paired t-tests. However, correlation analysis is indicated:
1. When the concentration range is and then wide that the errors, both random and systematic, are non independent (which is the assumption for the t-tests). This is often the case where concentration ranges of several magnitudes are involved.2. When pairing is inappropriate for other reasons, notably a long time bridge between the two analyses (sample aging, change in laboratory conditions, etc.).
The principle is to establish a statistical linear relationship between two sets of corresponding data past fitting the information to a straight line by means of the "to the lowest degree squares" technique. Such data are, for example, analytical results of 2 methods applied to the aforementioned samples (correlation), or the response of an instrument to a serial of standard solutions (regression).
Annotation: Naturally, not-linear higher-order relationships are also possible, but since these are less common in belittling work and more circuitous to handle mathematically, they volition not be discussed hither. Nevertheless, to avoid misinterpretation, always inspect the kind of relationship by plotting the data, either on paper or on the computer monitor.
The resulting line takes the general form:
where
a = intercept of the line with the y-axis
b = slope (tangent)
In laboratory work ideally, when there is perfect positive correlation without bias, the intercept a = 0 and the gradient = 1. This is the and so-called "ane:1 line" passing through the origin (dashed line in Fig. six-5).
If the intercept a ¹ 0 then there is a systematic discrepancy (bias, fault) between X and Y; when b ¹ 1 then there is a proportional response or difference between X and Y.
The correlation betwixt 10 and Y is expressed past the correlation coefficient r which can be calculated with the following equation:
| half-dozen.19 |
where
x i = data X
¯ten = mean of information X
y i = data Y
¯y = mean of information Y
Information technology can be shown that r can vary from ane to -1:
r = 1 perfect positive linear correlation
r = 0 no linear correlation (maybe other correlation)
r = -ane perfect negative linear correlation
Oft, the correlation coefficient r is expressed every bit r 2 : the coefficient of determination or coefficient of variance. The advantage of r2 is that, when multiplied by 100, information technology indicates the percent of variation in Y associated with variation in X. Thus, for case, when r = 0.71 nearly 50% (r 2 = 0.504) of the variation in Y is due to the variation in X.
The line parameters b and a are calculated with the following equations:
| 6.20 |
and
It is worth to note that r is contained of the choice which factor is the independent manufacturing plant and which is the dependent Y. However, the regression parameters a and practise depend on this choice as the regression lines will be different (except when there is platonic ane:i correlation).
6.4.4.1 Structure of calibration graph
As an example, nosotros accept a standard series of P (0-i.0 mg/50) for the spectrophotometric decision of phosphate in a Bray-I excerpt ("available P"), reading in absorbance units. The data and calculated terms needed to determine the parameters of the calibration graph are given in Table half dozen-five. The line itself is plotted in Fig. six-4.
Table 6-five is presented here to requite an insight in the steps and terms involved. The calculation of the correlation coefficient r with Equation (6.19) yields a value of 0.997 (r two = 0.995). Such high values are common for calibration graphs. When the value is non close to 1 (say, below 0.98) this must exist taken every bit a warning and it might and so exist appropriate to repeat or review the procedure. Errors may take been made (e.chiliad. in pipetting) or the used range of the graph may not be linear. On the other mitt, a loftier r may be misleading as it does not necessarily indicate linearity. Therefore, to verify this, the scale graph should always exist plotted, either on paper or on reckoner monitor.
Using Equations (6.20 and (six.21) we obtain:
and
a = 0.350 - 0.313 = 0.037
Thus, the equation of the calibration line is:
y = 0.626x + 0.037 | (vi.22) |
Table half-dozen-5. Parameters of calibration graph in Fig. 6-4.
ten i | y i | x 1 -¯x | (ten i -¯x) ii | y i -¯y | (y i -¯y) two | (10 1 -¯x)(y i -¯y) |
0.0 | 0.05 | -0.five | 0.25 | -0.xxx | 0.090 | 0.150 |
0.ii | 0.14 | -0.three | 0.09 | -0.21 | 0.044 | 0.063 |
0.4 | 0.29 | -0.i | 0.01 | -0.06 | 0.004 | 0.006 |
0.6 | 0.43 | 0.1 | 0.01 | 0.08 | 0.006 | 0.008 |
0.8 | 0.52 | 0.3 | 0.09 | 0.17 | 0.029 | 0.051 |
1.0 | 0.67 | 0.five | 0.25 | 0.32 | 0.102 | 0.160 |
3.0 | 2.10 | 0 | 0.70 | 0 | 0.2754 | 0.438 S |
¯10=0.v | ¯y = 0.35 |
Fig. 6-4. Scale graph plotted from information of Table half dozen-5. The dashed lines delineate the 95% confidence area of the graph. Note that the conviction is highest at the centroid of the graph.
During calculation, the maximum number of decimals is used, rounding off to the last significant effigy is done at the finish (see instruction for rounding off in Department 8.2).
Once the scale graph is established, its use is uncomplicated: for each y value measured the corresponding concentration ten can be adamant either by direct reading or by calculation using Equation (half-dozen.22). The utilise of calibration graphs is further discussed in Section vii.2.ii.
Note. A treatise of the error or uncertainty in the regression line is given.
6.iv.four.ii Comparing 2 sets of information using many samples at different analyte levels
Although regression analysis assumes that one factor (on the x-axis) is constant, when sure weather condition are met the technique tin can also successfully exist applied to comparison two variables such every bit laboratories or methods. These conditions are:
- The most precise data prepare is plotted on the x-axis
- At least 6, but preferably more ten different samples are analyzed
- The samples should rather uniformly cover the analyte level range of interest.
To decide which laboratory or method is the nearly precise, multi-replicate results have to be used to calculate standard deviations (see half-dozen.4.ii). If these are not available then the standard deviations of the present sets could be compared (note that we are now not dealing with normally distributed sets of replicate results). Some other user-friendly fashion is to run the regression analysis on the computer, reverse the variables and run the analysis again. Detect which variable has the lowest standard deviation (or standard fault of the intercept a, both given by the computer) so use the results of the regression analysis where this variable was plotted on the x-axis.
If the analyte level range is incomplete, i might take to resort to spiking or standard additions, with the inherent drawback that the original analyte-sample combination may not adequately be reflected.
Case
In the framework of a performance verification plan, a large number of soil samples were analyzed by 2 laboratories X and Y (a form of "third-line control", see Chapter 9) and the data compared past regression. (In this particular instance, the paired t-test might take been considered also). The regression line of a common aspect, the pH, is shown here as an illustration. Effigy 6-5 shows the then-chosen "scatter plot" of 124 soil pH-H2O determinations by the ii laboratories. The correlation coefficient r is 0.97 which is very satisfactory. The slope (= i.03) indicates that the regression line is only slightly steeper than the one:i ideal regression line. Very disturbing, notwithstanding, is the intercept a of -1.18. This implies that laboratory Y measures the pH more than a whole unit lower than laboratory 10 at the low end of the pH range (the intercept -ane.18 is at pHx = 0) which departure decreases to about 0.8 unit at the high end.
Fig. 6-v. Scatter plot of pH data of two laboratories. Fatigued line: regression line; dashed line: i:1 ideal regression line.
The t-examination for significance is as follows:
For intercept a: k a = 0 (null hypothesis: no bias; ideal intercept is then cipher), standard fault =0.fourteen (calculated by the calculator), and using Equation (6.12) we obtain:
Here, t tab = ane.98 (App. 1, two-sided, df = due north - 2 = 122 (north-2 considering an extra degree of liberty is lost as the data are used for both a and b) hence, the laboratories have a significant common bias.
For slope: m b = 1 (ideal gradient: null hypothesis is no difference), standard fault = 0.02 (given by computer), and again using Equation (6.12) we obtain:
Again, t tab = ane.98 (App. ane; 2-sided, df = 122), hence, the difference between the laboratories is not significantly proportional (or: the laboratories do non accept a meaning difference in sensitivity). These results suggest that in spite of the adept correlation, the two laboratories would have to look into the crusade of the bias.
Notation. In the present example, the scattering of the points around the regression line does not seem to alter much over the whole range. This indicates that the precision of laboratory Y does not alter very much over the range with respect to laboratory X. This is not always the example. In such cases, weighted regression (not discussed hither) is more appropriate than the unweighted regression as used here.Validation of a method (run into Section 7.5) may reveal that precision can change significantly with the level of analyte (and with other factors such as sample matrix).
6.4.v Assay of variance (ANOVA)
When results of laboratories or methods are compared where more than ane gene can be of influence and must be distinguished from random effects, then ANOVA is a powerful statistical tool to be used. Examples of such factors are: different analysts, samples with different pre-treatments, dissimilar analyte levels, unlike methods inside i of the laboratories). Most statistical packages for the PC can perform this analysis.
Every bit a treatise of ANOVA is beyond the scope of the present Guidelines, for farther discussion the reader is referred to statistical textbooks, some of which are given in the listing of Literature.
Fault or uncertainty in the regression line
The "fitting" of the scale graph is necessary considering the response points y i , composing the line do not fall exactly on the line. Hence, random errors are implied. This is expressed by an dubiousness about the gradient and intercept b and a defining the line. A quantification can be found in the standard divergence of these parameters. Most computer programmes for regression volition automatically produce figures for these. To illustrate the procedure, the example of the calibration graph in Section half-dozen.4.3.1 is elaborated here.
A practical quantification of the uncertainty is obtained by calculating the standard deviation of the points on the line; the "remainder standard departure" or "standard error of the y-estimate", which we causeless to be constant (but which is merely approximately so, see Fig. 6-4):
| (6.23) |
where
= "fitted" y-value for each ten i , (read from graph or calculated with Eq. 6.22). Thus, is the (vertical) deviation of the found y-values from the line.n = number of calibration points.
Annotation: Only the y-deviations of the points from the line are considered. It is assumed that deviations in the 10-direction are negligible. This is, of course, only the case if the standards are very accurately prepared.
Now the standard deviations for the intercept a and slope b can exist calculated with:
| 6.24 |
and
| 6.25 |
To make this procedure clear, the parameters involved are listed in Table six-half dozen.
The doubt about the regression line is expressed past the confidence limits of a and b according to Eq. (6.nine): a ± t.south a and b ± t.s b
Tabular array 6-six. Parameters for computing errors due to calibration graph (use as well figures of Tabular array 6-5).
x i | y i |
|
|
|
0 | 0.05 | 0.037 | 0.013 | 0.0002 |
0.2 | 0.14 | 0.162 | -0.022 | 0.0005 |
0.4 | 0.29 | 0.287 | 0.003 | 0.0000 |
0.6 | 0.43 | 0.413 | 0.017 | 0.0003 |
0.viii | 0.52 | 0.538 | -0.018 | 0.0003 |
i.0 | 0.67 | 0.663 | 0.007 | 0.0001 |
0.001364 S |
In the nowadays example, using Eq. (six.23), we summate
and, using Eq. (6.24) and Table 6-5:
and, using Eq. (6.25) and Table 6-v:
The applicable t tab is two.78 (App. ane, two-sided, df = north -1 = 4) hence, using Eq. (6.9):
a = 0.037 ± two.78 × 0.0132 = 0.037 ± 0.037
and
b = 0.626 ± 2.78 × 0.0219 = 0.626 ± 0.061
Note that if due south a is large enough, a negative value for a is possible, i.east. a negative reading for the blank or zero-standard. (For a word about the error in 10 resulting from a reading in y, which is particularly relevant for reading a calibration graph, come across Section seven.2.iii)
The uncertainty almost the line is somewhat decreased by using more scale points (assuming southward y has not increased): one more point reduces t tab from 2.78 to two.57 (see Appendix ane).
Source: https://www.fao.org/3/w7295e/w7295e08.htm
Posted by: nelsontardwilis.blogspot.com
0 Response to "How Do Economists Use Population To Compare Data Sets And Draw Conclusions"
Post a Comment