Content
Chi-square Tests
Chi-square tests
2 by 2 chi-square test
2 by k chi-square test
r by c contingency table analysis
McNemar chi-square and exact test for matched pairs
Maxwell chi-square for k paired data
Mantel-Haenszel test
Generalised Cochran-Mantel-Haenszel tests
Woolf statistics
Chi-square goodness of fit test
Crosstabs
Chi-square tests. 1
2 by 2 chi-square test. 2
2 by k chi-square test. 5
r by c contingency table analysis. 7
McNemar chi-square and exact test for matched pairs. 12
Kappa and Maxwell. 13
Mantel-Haenszel test and odds ratio meta-analysis. 17
Generalised Cochran-Mantel-Haenszel tests. 21
Woolf statistics for 2 by 2 tables & series. 22
Chi-square goodness of fit test. 24
Chi-square goodness of fit test. 26
·2by 2
·2by k
·r by c
·Matchedpairs (McNemar, Liddell)
·Mantel-Haenszeland odds ratio meta-analysis
·Woolf
·Chi-squaregoodness of fit
Menu location: Analysis_Chi-Square.
Chi-square tests can be used totest the association between two classifications (classifier variables) of aset of counts or frequencies. This two dimensional arrangement is commonlydisplayed as a contingency table or cross classification where rows representone variable and columns represent the other. The null hypothesis is that thereis no association between the two variables.
The main assumption of this groupof methods is that for any observation it can only belong to one cell in thecontingency table.
Row and column totals (marginaltotals) are used to predict what count would be expected for each cell if thenull hypothesis were true. A test statistic which is approximately distributedas a chi-square variable is calculated from the observed and expectedfrequencies. The larger the test statistic (for given degrees of freedom) themore likely there is to be a statistically significant association between thetwo variables.
This group of methods is intendedfor medium to large samples. Cochrane's rule states that no expected frequencyshould be less than 1 and at least 80% of expected frequencies should begreater than 5 (Bland,2000).
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Chi-Square_2by 2.
The two by two or fourfoldcontingency table represents two classifications of a set of counts orfrequencies. The rows represent two classifications of one variable (e.g.outcome positive/outcome negative) and the columns represent twoclassifications of another variable (e.g. intervention/no intervention). Theseclassifications must be independent. Paired results (e.g. outcomes for samegroup of individuals before and after intervention) should be analysed using a test for matched pairs.The chi-square statistic calculated from the table test the independencebetween the two classifications.
Assumptions of the tests ofindependence:
·the sample israndom
·each observationmay be classified into one cell (in the table) only
- where,for r rows and c columns of n observations, O is an observed frequency and E isan estimated expected frequency. The expected frequency for any cell isestimated as the row total times the column total then divided by the grandtotal (n).
Yates' continuity correctionimproves the approximation of the discrete sample chi-square statistic to acontinuous chi-square distribution (Armitage and Berry,1994):
The r by cchi-square function can be used to examine two by two tables in greaterdetail.
Pearson's and Cramér's(V) coefficients of contingency reflect the strength of the associationin a contingency table (Agresti, 1996;Fleiss, 1981; Stuart and Ord, 1994):
You will see another coefficient,phi (f, correlation), given by the r by c chi-square function. This is equalto an unsigned version of V with 2 by 2 tables.
Fisher'sexact test should be used as an alternative to the fourfold chi-square testif the total number of observations is less than twenty or any of the expectedfrequencies are less than five. In practical terms, however, there is littlepoint in using the fourfold chi-square for testing independence when StatsDirect provides a Fisher's exact test that can copewith large numbers.
If you specify that your resultsare from a case-control study then StatsDirect addsan odds ratio analysis. With a, b, c and d as the observed frequencies arrangedas in the table below:
Odds ratio (OR) is related torisk ratio (RR, relative risk):
RR = (a / (a+c))/ (b / (b+d))
When a issmall in comparison to c and b is small in comparison to d (i.e. relativelysmall numbers of outcome positive observations or low prevalence) then c can besubstituted for a+c and d can be substituted for d+b in the above. With a little rearrangement this givesthe odds ratio (cross ratio, approximate relative risk):
OR = (a*d)/(b*c).
A confidence interval (CI) forthe odds ratio is calculated using two different methods. The Wolf (logit) method for large samples is given first followed byan exact conditional maximum likelihood method (Fleiss, 1979;Gardner and Altman, 1989; Martin and Austin, 1991). Please note that theexact calculations may take an appreciable amount of time with large numbers.
If you specify that your resultsare from a cohort study then StatsDirect adds arelative risk analysis. See risk (prospective)for more details.
DATA INPUT:
Observed frequencies should beentered as a standard fourfold table:
feature present | feature absent | |
outcome positive | a | B |
outcome negative | c | D |
Example
From Armitage and Berry(1994).
The following represent mortalitydata for two groups of patients receiving different treatments, A and B.
Outcome | |||
Dead | Alive | ||
Treatment / Exposure | A | 41 | 216 |
B | 64 | 180 |
To analysethese data in StatsDirect you must select the 2 by 2contingency table from the chi-square section of the analysis menu. Select thedefault 95% confidence interval. Enter the frequencies into the contingencytable on screen. Note that the input screen has outcome values from top tobottom and the other classifier (e.g. treatment) from left to right, some booksand papers show these the other way around.
For this example:
Observed values and totals:
41 | 216 | 257 |
64 | 180 | 244 |
105 | 396 | 501 |
Expected values:
53.862275 | 203.137725 |
51.137725 | 192.862275 |
Uncorrected Chi² = 7.978869 P =0.0047
Yates-corrected Chi² = 7.370595 P= 0.0066
Measures of association:
Pearson's contingency = 0.125205
Cramér's V (signed) = -0.126198
Odds ratio analysis
Odds Ratio = 0.533854
Using the Woolf (logit) approximation:
Approximate 95% confidenceinterval = 0.344118 to 0.828206
Using conditional likelihoodestimation:
Fisher exact 95% confidenceinterval = 0.334788 to 0.846292
Exact Fisher one sided P =0.0033, two sided P = 0.0059
mid-P exact 95% confidence interval = 0.342636 to 0.828071
Exact mid-P one sided P = 0.0024,two sided P = 0.0049
Here we can see a statisticallysignificant relationship between treatment and mortality. The strength of thatrelationship is reflected by the coefficient of contingency. The odds ratiotells us that the odds in favour of dying aftertreatment A are about half of the odds of dying after treatment B. With 95%confidence we put the true population value for this ratio of odds somewherebetween 0.33 and 0.85. If you need to phrase the arguments with odds ratios theother way around then just quote the reciprocals, i.e. here we would say thatthe odds of dying after treatment B are 1.9 times greater than after treatmentA.
P values
confidenceintervals
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Chi-Square_2by k.
Several proportions can becompared using a 2 by k chi-square test. For example, a random sample of peoplecan be subdivided into k age groups and counts made of those individuals withand those without a particular attribute. For this sample, a 2 by k chi-squaretest could be used to test whether or not age has a statistically significanteffect on the attribute studied. This is a test of the independence of the rowand column variables, it is equivalent to the chi-square independence tests for2 by 2 and r by c chi-square tables.
- where,for r rows and c columns of n observations, O is an observed frequency and E isan estimated expected frequency. The expected frequency for any cell isestimated as the row total times the column total then divided by the grandtotal (n).
Assumptions of the tests ofindependence:
·the sample israndom
·each observationmay be classified into one cell (in the table) only
Note that an exact test ofindependence is provided in the r by c tableanalysis function; you should use this instead of the chi-square statisticwhen you have small numbers (say expected frequency less than 5) in any of thetable cells.
If there is a meaningful order toyour k groups (e.g. sequential age bands) then the chi-square test for trendprovides a more powerful test than the unordered independence test above. StatsDirect automatically performs a test for linear trendacross the k groups. You can enter your own scores for the trend test. Forexample, if a variable was categorised as mild,moderate or severe pain then scores 1, 2 and 3 are likely to be a reasonable,so leave StatsDirect to assign scores. If, instead,the categories were mild, moderate and worst ever pain then you might enter alinear score system as 1, 2 and 5 respectively (Armitage and Berry,1994; Altman, 1991).
- whereeach of k groups of observations are denoted as risuccesses out of ni total with score vi assigned. Ris the sum of all ri, N isthe sum of all ni and p = R/N.
Should you wish to investigateyour 2 by k table further then the r by cchi-square test provides a more detailed analysis. Please note that thelinear trend analysis may differ slightly between the 2 by k and r by cchi-square tests, this is because the r by c linear trend analysis is notcalculated as above but instead considers trend in both dimensions of the table(closely related to Pearson's correlation).
Example
From Armitage and Berry(1994).
The following data describenumbers of children with different sized palatine tonsils and their carrierstatus for Strep. pyogenes.
Tonsils | |||
not enlarged | Enlarged | Enlarged greatly | |
Carriers | 19 | 29 | 24 |
Non-carriers | 497 | 560 | 269 |
To analysethese data in StatsDirect you must select 2 by k(scores 1 to k) from the chi-square section of the analysis menu. Then selectthe middle option from the 2 by k chi-square test menu. Choose the default 95%confidence interval. Then select the number of rows as 3. You then enter theabove data as directed by the screen. Use carriers as successes andnon-carriers as failures.
For this example:
Successes | Failures | Total | Per cent | |
Observed | 19 | 497 | 516 | 3.682171 |
Expected | 26.57511 | 489.4249 | ||
Observed | 29 | 560 | 589 | 4.923599 |
Expected | 30.33476 | 558.6652 | ||
Observed | 24 | 269 | 293 | 8.191126 |
Expected | 15.09013 | 277.9099 | ||
Total | 72 | 1326 | 1398 | 5.150215 |
Total Chi² = 7.884843 |Chi| =2.807996, (2 DF), P = .0194
Chi² for linear trend = 7.19275|Chi| = 2.68193, (1 DF), P = .0073
Remaining Chi² (non-linearity) =0.692093, (1 DF), P = .4055
Here the total chi-square testshows a statistically significant association between the classifications, i.e.between tonsil size and Strep. pyogenescarrier status. We have also shown a significant linear trend which 醫(yī).學(xué)全在線m.f1411.cnenables usto refine our conclusions to a suggestion that the proportion of Strep. pyogenes carriers increases withtonsil size.
P values
confidenceintervals
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Chi-Square_r by c.
The r by c chi-square test in StatsDirect uses a number of methods to investigate two waycontingency tables that consist of any number of independent categories formingr rows and c columns.
Tests of independence of thecategories in a table are the chi-square test, the G-square (likelihood-ratiochi-square) test and the generalised Fisher exact(Fisher-Freeman-Halton) test. All three tests indicatethe degree of independence between the variables that make up the table.
The generalisedFisher exact test is difficult to compute (Mehta and Patel,1983, 1986a); it may take a long time and it may not be computed for thetable that you enter. If the Fisher exact method cannot be computed practicallythen a hybrid method based upon Cochrane rules is used (Mehta and Patel,1986b); this may also fail with large tables and/or numbers. TheFisher-Freeman-Halton result is quoted with just oneP value as it is implicitly two-sided.
Relating the Fisher-Freeman-Halton statistic to the Pearson Chi-square statistic:
·The null hypothesisis independence between row and column categories.
·Let t denotea table from the set of all tables with the same row and column margins.
·Let D(t) be the measure ofdiscrepancy.
·The exact two sided P value = P [D(t)>= D(t observed)] = sum of hypergeometricprobabilities of those tables where D(t) is larger than or equal to theobserved table.
·In large samples the distribution of D(t) conditional onfixed row and column margins converges to the chi-square distribution with (r-1)(c-1)degrees of freedom.
The G-square statistic is lessreliable than the chi-square statistic when you have small numbers. In general,you should use the chi-square statistic if the Fisher exact test is notcomputable. If you consult a statistician then it would be useful to providethe G-square statistic also.
These tests of independence aresuitable for nominal data. If your data are ordinal then you should use themore powerful tests for trend (Armitage and Berry,1994; Agresti, 2002, 1996).
Assumptions of the tests ofindependence:
·the sample israndom
·each observationmay be classified into one cell (in the table) only
- where,for r rows and c columns of n observations, O is an observed frequency and E isan estimated expected frequency. The expected frequency for any cell isestimated as the row total times the column total then divided by the grandtotal (n).
- where P is the two sided Fisherprobability, Pf is the conditional probability for the observed table givenfixed row and column totals (fi. and f.j respectively), f.. is thetotal count and ! represents factorial.
Analysis of trend in r by ctables indicates how much of the general independence between scores isaccounted for by linear trend. StatsDirect usesequally spaced scores for this purpose unless you specify otherwise. If youwish to experiment with other scoring systems then expert statistical guidanceis advisable. Armitageand Berry (1994) quote an example where extent of grief of motherssuffering a perinatal death, graded I to IV, iscompared with the degree of support received by these women. In this examplethe overall statistic is non-significant but a significant trend isdemonstrated.
- where,for r rows and c columns of n observations, O is an observed frequency and E isan estimated expected frequency. The expected frequency for any cell isestimated as the row total times the column total then divided by the grandtotal (n). Row scores are u, column scores are v, row totals are Oj+ and column totals are Oi+.
The sample correlationcoefficient r reflects the direction and closeness of linear trend in yourtable. r may vary between -1 and 1 just like Pearson'sproduct moment correlation coefficient. Total independence of the categories inyour table would mean that r = 0. The test for linear trend is related to r byM²=(n-1)r² and this is numerically identical to Armitage's chi-square for linear trend (Armitage and Berry,1994; Agresti, 1996). If you interchange the rows and columns in your tablethen the value of M² will be the same
The ANOVA output appliestechniques similar to analysis of variance to an r by c table. Here theequality of mean column and row scores is tested. StatsDirectuses equally spaced scores for this purpose unless you specify otherwise. See Armitage for more information (Armitage and Berry,1994).
Pearson's and Cramér's(V) coefficients of contingency and the phi (f, correlation)coefficient reflect the strength of the association in a contingency table (Agresti, 1996;Fleiss, 1981; Stuart and Ord, 1994):
For 2 by 2 tables, Cramér's V is calculated alternatively as a signedvalue:
Observed values, expected valuesand totals are given for the table when c £ 8 and r £ 10.
If your datacategories are both ordered then you will gain more power in tests ofindependence by using the ordinal methods due to Goodman and Kruskal (gamma) and Kendall(tau-b). Large sample, asymptotically normalvariance estimates are used; the simple form is used for independence testing (Agresti, 1984;Conover, 1999; Goodman and Kruskal, 1963, 1972). Tau-btends to be less sensitive than gamma to the choice of response categories.
Example
From Armitage and Berry(1994, p. 408).
The following data (as above)describe the state of grief of 66 mothers who had suffered a neonatal death.The table relates this to the amount of support given to these women:
Support | ||||
Good | Adequate | Poor | ||
Grief State | I | 17 | 9 | 8 |
II | 6 | 5 | 1 | |
III | 3 | 5 | 4 | |
IV | 1 | 2 | 5 |
To analysethese data in StatsDirect you must select r by c fromthe chi-square section of the analysis menu. Choose the default 95% confidenceinterval. Then enter the above data as directed by the screen.
For this example:
Observed | 17 | 9 | 8 | 34 |
Expected | 13.91 | 10.82 | 9.27 | |
DChi² | 0.69 | 0.31 | 0.17 | |
Observed | 6 | 5 | 1 | 12 |
Expected | 4.91 | 3.82 | 3.27 | |
DChi² | 0.24 | 0.37 | 1.58 | |
Observed | 3 | 5 | 4 | 12 |
Expected | 4.91 | 3.82 | 3.27 | |
DChi² | 0.74 | 0.37 | 0.16 | |
Observed | 1 | 2 | 5 | 8 |
Expected | 3.27 | 2.55 | 2.18 | |
DChi² | 1.58 | 0.12 | 3.64 | |
Totals: | 27 | 21 | 18 | 66 |
TOTAL number of cells = 12
WARNING: 9 out of 12 cells have 1£EXPECTATION < 5
NOMINAL INDEPENDENCE
Chi-square = 9.9588, DF = 6, P =0.1264
G-square = 10.186039, DF = 6, P =0.117
Fisher-Freeman-Halton exact P = 0.1426
ANOVA
Chi-square for equality of meancolumn scores = 5.696401
DF = 2, P = 0.0579
LINEAR TREND
Sample correlation (r) = 0.295083
Chi-square for linear trend (M²)= 5.6598
DF = 1, P = 0.0174
NOMINAL ASSOCIATION
Phi = 0.388447
Pearson's contingency = 0.362088
Cramér's V = 0.274673
ORDINAL
Goodman-Kruskalgamma = 0.349223
Approximate test of gamma =0: SE = 0.15333, P = 0.0228, 95% CI = 0.048701 to 0.649744
Approximate test ofindependence: SE = 0.163609, P = 0.0328, 95% CI = 0.028554 to 0.669891
Kendall tau-b = 0.236078
Approximate test of tau-b = 0: SE = 0.108929, P = 0.0302, 95% CI = 0.02258to 0.449575
Approximate test ofindependence: SE = 0.110601, P = 0.0328, 95% CI = 0.019303 to 0.452852
Here we see that although theoverall test was not significant we did show a statistically significant trendin mean scores. This suggests that supporting these mothers did help lessentheir burden of grief.
P values
confidenceintervals
Copyright © 1990-2006 StatsDirect Limited,all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Exact_Matched Pairs.
Paired proportions havetraditionally been compared using McNemar's test butan exact alternative due to Liddell (1983)is preferable. StatsDirect gives you both.
The exact test is a special caseof the sign test. The b count in the table below is treated as a binomialvariable from the sample b+c. Using the ratio R' (R'= b/c) as a point estimate of relative risk, a two sided probability iscalculated that R' = 1 (the null hypothesis). The test statistic F=b/(c+1).
Confidence limits for R' arecalculated as follows:
- whereF(P,n,d) is a quantile fromthe F distribution with n and d degrees of freedom.
You should use the exact test foranalysis; McNemar's test is included for interestonly.
If you need the exact confidenceinterval for the difference between a pair of proportions then please see pairedproportions.
DATA INPUT:
Observed frequencies should beentered as a paired fourfold table:
Control/reference category: | |||
outcome present | outcome absent | ||
Case/index category | outcome present | a | b |
outcome absent | c | d |
Example
From Armitage and Berry(1994, p. 127).
The data below represent acomparison of two media for culturing Mycobacterium tuberculosis. Fifty suspectsputum specimens were plated up on both media and the following results wereobtained:
Medium B | |||
Growth | No Growth | ||
Medium A: | Growth | 20 | 12 |
No Growth | 2 | 16 |
To analysethese data in StatsDirect you must select the matchedpairs (McNemar, Liddell) from the chi-square sectionof the analysis menu. Select the default 95% confidence interval. Enter thecounts into the table as shown above.
For this example:
McNemar'stest:
Yates' continuity corrected Chi² = 5.785714 P = 0.0162
After Liddell (1983)Refs:
Point estimate of relative risk (R') = 6招生簡章
Exact 95% confidence interval = 1.335744 to 55.197091
F = 4 P (two sided) = 0.0129
R' is significantly different from unity
Here we can conclude that thetubercle bacilli in the experiment grew significantly better on medium A thanon medium B. With 95% confidence we can state that the chances of a positiveculture are between 1.34 and 55.20 times greater on medium A than on medium B.
P values
confidenceintervals
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Miscellaneous_Kappa & Maxwell.
AgreementAnalysis
For the case of two raters, thisfunction gives Cohen's kappa (weighted and unweighted)and Scott's pi as measures of inter-rater agreement for two raters' categoricalassessments (Fleiss,1981; Altman, 1991; Scott 1955). For three or more raters, this functiongives extensions of the Cohen kappa method, due to Fleiss and Cuzick(1979) in the case of two possible responses per rater, and Fleiss, Nee andLandis (1979) in the general case of three or more responses per rater.
If you have only two categoriesthen Scott's pi is the statistic of choice (with confidence intervalconstructed by the Donner-Eliasziw(1992) method) for inter-rater agreement (Zwick, 1988).
Weighted kappa partly compensatesfor a problem with unweighted kappa, namely that itis not adjusted for the degree of disagreement. Disagreement is weighted indecreasing priority from the top left (origin) of the table. StatsDirect uses the following definitions for weight (1 isthe default):
1. w(ij)=1-abs(i-j)/(g-1)
2. w(ij)=1-[(i-j)/(g-1)]²
3. User defined (this is only availablevia workbook data entry)
g = categories
w = weight
i = category for one observer (from 1 to g)
j = category for the otherobserver (from 1 to g)
In broad terms a kappa below 0.2indicates poor agreement and a kappa above 0.8 indicates very good agreementbeyond chance.
Guide (Landis and Koch,1977):
Kappa | Strength of agreement |
< 0.2 | Poor |
> 0.2 £ 0.4 | Fair |
> 0.4 £ 0.6 | Moderate |
> 0.6 £ 0.8 | Good |
> 0.8 £ 1 | Very good |
N.B. You can not reliably compare kappa values from different studiesbecause kappa is sensitive to the prevalence of different categories. i.e. if one category is observed more commonly in one studythan another then kappa may indicate a difference in inter-rater agreementwhich is not due to the raters.
Agreement analysis with more thantwo raters is a complex and controversial subject, see Fleiss (1981, p.225).
Disagreement Analysis
StatsDirect uses the methods of Maxwell (1970)to test for differences between the ratings of the two raters (or k nominalresponses with paired observations).
Maxwell's chi-square statistictests for overall disagreement between the two raters. The general McNemar statistic tests for asymmetry in the distributionof subjects about which the raters disagree, i.e. disagreement more over somecategories of response than others.
Data preparation
You may present your data for thetwo-rater methods as a fourfold table in the interactive screen data entry menuoption. Otherwise, you may present your data as responses/ratings in columnsand rows in a worksheet, where the columns represent raters and the rowsrepresent subjects rated. If you have more than two raters then you mustpresent your data in the worksheet column (rater) row (subject) format. Missingdata can be used where raters did not rate all subjects.
Technical validation
All formulae for kappa statisticsand their tests are as per Fleiss (1981):
For two raters (m=2) and twocategories (k=2):
- where n is the number ofsubjects rated, w is the weight for agreement or disagreement, po is the observed proportion of agreement, pe is the expected proportion of agreement, pij is the fraction of ratings iby the first rater and j by the second rater, and so is the standard error fortesting that the kappa statistic equals zero.
For three or more raters (m>2)and two categories (k =2):
- wherexi is the number of positive ratings out of mi raters for subject i of n subjects, and so is the standard error for testingthat the kappa statistic equals zero.
For three or more raters andcategories (m>2, k>2):
- where sojis the standard error for testing kappa equal for each rating categoryseparately, and so bar is the standard error for testing kappa equal to zerofor the overall kappa across the k categories. Kappa hat is calculated as forthe m>2, k=2 method shown above.
Example
From Altman (1991).
Altman quotes the results of Brostoff et al. in a comparison not of two human observersbut of two different methods of assessment. These methods are RAST (radioallergosorbent test) and MAST (multi-RAST) for testingthe sera of individuals for specifically reactive IgEin the diagnosis of allergies. Five categories of result were recorded usingeach method:
RAST | ||||||
Negative | weak | moderate | high | very high | ||
MAST | negative | 86 | 3 | 14 | 0 | 2 |
Weak | 26 | 0 | 10 | 4 | 0 | |
Moderate | 20 | 2 | 22 | 4 | 1 | |
High | 11 | 1 | 37 | 16 | 14 | |
very high | 3 | 0 | 15 | 24 | 48 |
To analysethese data in StatsDirect you may select kappa fromthe miscellaneous section of the analysis menu. Choose the default 95%confidence interval. Enter the above frequencies as directed on the screen andselect the default method for weighting.
For this example:
General agreement over allcategories (2 raters)
Cohen's kappa (unweighted)
Observed agreement = 47.38%
Expected agreement = 22.78%
Kappa = 0.318628 (se = 0.026776)
95% confidence interval = 0.266147to 0.371109
z (for k = 0) = 11.899574
P < 0.0001
Cohen's kappa (weighted by 1-Abs(i-j)/(1 - k))
Observed agreement = 80.51%
Expected agreement = 55.81%
Kappa = 0.558953 (se = 0.038019)
95% confidence interval for kappa= 0.484438 to 0.633469
z (for kw = 0) = 14.701958
P < 0.0001
Scott's pi
Observed agreement = 47.38%
Expected agreement = 24.07%
Pi = 0.30701
Disagreement over any categoryand asymmetry of disagreement (2 raters)
Marginal homogeneity (Maxwell)chi-square = 73.013451, df =4, P < 0.0001
Symmetry (generalisedMcNemar) chi-square = 79.076091, df = 10, P < 0.0001
Note that for calculation ofstandard errors for the kappa statistics, StatsDirectuses a more accurate method than that which is quoted in most textbooks (e.g. Altman, 1990).
The statistically highlysignificant z tests indicate that we should reject the null hypothesis that theratings are independent (i.e. kappa = 0) and accept the alternative that agreementis better than one would expect by chance. Do not put too much emphasis on thekappa statistic test, it makes a lot of assumptions and falls into error withsmall numbers.
The statistically highlysignificant Maxwell test statistic above indicates that the raters disagreesignificantly in at least one category. The generalisedMcNemar statistic indicates the disagreement is notspread evenly.
confidenceintervals
P values
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu locations:
Analysis_Chi-Square_MantelHaenszel;
Analysis_Meta-analysis_Odds Ratio.
Case-control studies ofdichotomous outcomes (e.g. healed or not healed) can by represented byarranging the observed counts into fourfold (2 by 2) tables. The separation ofdata into different tables or strata represents a sub-grouping, e.g. into agebands. Stratification of this kind is sometimes used to reduce confounding.
The Mantel-Haenszelmethod provides a pooled odds ratio across the strata of fourfold tables. Meta-analysis is used to investigate the combinationor interaction of a group of independent studies, for example a series offourfold tables from similar studies conducted at different centres.
This StatsDirectfunction examines the odds ratio for each stratum (a single fourfold table) andfor the group of studies as a whole. Exact methods are used here in addition toconventional approximations.
For a single stratum odds ratiois estimated as follows:
Exposed | Non-Exposed | ||
OUTCOME | Cases | a | b |
Non-cases | c | d |
Sample estimate of the odds ratio= (ad)/(bc)
For each table, the observed oddsratio is displayed with an exact confidence interval (Martin and Austin,1991; Sahai and Kurshid, 1996). With very large numbers these calculationscan take an appreciable amount of time. If the ‘try exact’ optionis not selected then the logit (Woolf) interval isgiven instead.
The Mantel-Haenszelmethod is used to estimate the pooled odds ratio for all strata, assuming afixed effects model:
- where ni = ai+bi+ci+di.
Alternative methods, such Woolfand inverse variance, can be used to estimate the pooled odds ratio with fixedeffects but the Mantel-Haenszel method is generallythe most robust. A confidence interval for the Mantel-Haenszelodds ratio in StatsDirect is calculated using theRobins, Breslow and Greenland variance formula (Robins et al., 1986)or by the method of Sato (1990) ifthe estimate of the odds ratio can not be determined. A chi-square teststatistic is given with its associated probability that the pooled odds ratiois equal to one.
If any cell count in a table iszero then a continuity correction is applied to each cell in that table – ifyou have selected the ‘delay continuity correction’ optionthen no continuity correction is applied to the Mantel-Haenszelcalculation unless all of the ‘a(chǎn)’ cells or all of the ‘d’ cells are zero acrossthe studies. The type of continuity correction used is set in the options.
An exact conditional likelihoodmethod is optionally used to evaluate the pooled odds ratio (Martin and Austin,2000). The exact method may take an appreciable time to compute with largenumbers. The exact results should be used in preference to the Mantel-Haenszel approximation, especially if some categoriesinvolve few observations (less than 15 or so).
The inconsistency of resultsacross studies is summarised in the I² statistic,which is the percentage of variation across studies that is due toheterogeneity rather than chance – see the heterogeneitysection for more information.
Note that the results from StatsDirect may differ slightly from other software or fromthose quoted in papers; this is due to differences in the variance formulae. StatsDirect employs the most robust practical approaches tovariance according to accepted statistical literature.
DATA INPUT:
Observed frequencies may beentered in a workbook (see example in relative riskmeta-analysis) or directly via the screen as multiple fourfold tables:
feature present | feature absent | |
outcome positive | a | b |
outcome negative | c | d |
Example
From Armitage and Berry(1994, p. 516).
The following data compare thesmoking status of lung cancer patients with controls. Ten different studies arecombined in an attempt to improve the overall estimate of relative risk. Thematching of controls has been ignored because there was not enough informationabout matching from each study to be sure that the matching was the same ineach study.
Lung cancer | Controls | ||
smoker | non-smoker | smoker | non-smoker |
83 | 3 | 72 | 14 |
90 | 3 | 227 | 43 |
129 | 7 | 81 | 19 |
412 | 32 | 299 | 131 |
1350 | 7 | 1296 | 61 |
60 | 3 | 106 | 27 |
459 | 18 | 534 | 81 |
499 | 19 | 462 | 56 |
451 | 39 | 1729 | 636 |
260 | 5 | 259 | 28 |
To analysethese data in StatsDirect you may select the Mantel-Haenszel function from the chi-square section of theanalysis menu. Select the default 95% confidence interval. Enter the number oftables as 10. Then enter each row of the table above as a separate 2 by 2contingency table:
i.e. The first row is entered as:
Smkr | Non | |
Lung cancer | 83 | 3 |
Control | 72 | 14 |
... thisis then repeated for each of the ten rows.
For this example:
Fixedeffects (Mantel-Haenszel, Robins-Breslow-Greenland)
Pooledodds ratio = 4.681639 (95% CI = 3.865935 to 5.669455)
Chi²(test odds ratio differs from 1) = 292.379352 P < 0.0001
Fixedeffects (conditional maximum likelihood)
Pooledodds ratio = 4.713244
ExactFisher 95% CI = 3.888241 to 5.747141
ExactFisher one sided P < 0.0001, two sided P < 0.0001
Exactmid-P 95% CI = 3.904839 to 5.719768
Exactmid-P one sided P < 0.0001, two sided P < 0.0001
Non-combinabilityof studies
Breslow-Day = 6.766765 (df = 9) P = 0.6614
CochranQ = 6.641235 (df = 9) P =0.6744
Moment-basedestimate of between studies variance = 0
I²(inconsistency) = 0% (95% CI = 0% to 52.7%)
Randomeffects (DerSimonian-Laird)
Pooledodds ratio = 4.625084 (95% CI = 3.821652 to 5.597423)
Chi²(test odds ratio differs from 1) = 247.466729 (df = 1) P < 0.0001
Biasindicators
Begg-Mazumdar: Kendall's tau = 0.111111 P = 0.7275 (low power)
Egger:bias = 0.476675 (95% CI = -0.786168 to 1.739517) P = 0.4094
Horbold-Egger: bias = 0.805788 (92.5% CI = -0.686033to 2.297609) P = 0.3013
Here we can say with 95%confidence that the true population odds in favour ofbeing a smoker were between 3.9 and 5.7 times greater in patients who had lungcancer compared with controls.
P values
confidenceintervals
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Crosstabs.
Three generalisedtests for association between row and column classes are offered for stratifiedr by c tables produced in the crosstabs function when you specify a third(stratum, controlling for) classifier (Agresti, 2002;Landis et al., 1978, 1979).
The first test (ordinalassociation) assumes that there is meaningful order to both the columns androws of each r by c table.
The second test (ordinal columnsvs. nominal rows) assumes that there is meaningful order in the columns of eachr by c table.
The third test (nominalassociation) does not assume any order in rows or columns; it provides ageneral test of association between the row and column classifiers.
The reliability of the testsincreases with sample size, but unlike the Pearson chi-square statistic forsingle r by c tables, small counts in a few cells are unlikely to invalidatethe tests.
You could control for more thanone factor by making a stratum variable consisting of several factors (e.g. UKmale, US male, UK female, US female to control for gender and country ofresidence).
Note that there are other approachesto these analyses, namely ordinal and nominal logistic regression. You shouldconsult with a statistician before using these methods in important studies.
Data entry
Note the potentially confusingterminology over row and column scores: The ‘row scores’ are the scoresassociated with the column classification; these are applied to the entries (bycolumn) in each row. The ‘column scores’ are the scores associated with the rowclassification; these are applied to the entries (by row) in each column.
Example
From Agresti (2002).
The data can be found in the'Other' worksheet of the test workbook. Use the menu item Analysis_Crosstabsto generate a cross tabulation of the job satisfaction, income and gendervariables. Use the row (income) scores as 3, 10, 20 and 35. Use the column (jobsatisfaction) scores as 1, 3, 4 and 5.
For this example:
Generalised Cochran-Mantel-Haenszel tests
Row variable (first classifier):Income
Column variable (secondclassifier): Job Satisfaction
Stratum variable (thirdclassifier, controlling for): Gender
Income scores: 3, 10, 20, 35
Job Satisfaction scores: 1, 3, 4,5
Alternative hypothesis | Statistic | DF | Probability |
Ordinal association | 6.156301 | 1 | P = 0.0131 |
Nominal rows vs. ordinal columns association | 9.034222 | 3 | P = 0.0288 |
Nominal association | 10.200089 | 9 | P = 0.3345 |
Sample size = 104
From the results above you cansee that the strongest effect detected is ordinal association (i.e. associationbetween greater job satisfaction with greater income), after controlling forgender.
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Chi-Square_Woolf.
In case-control studies observedfrequencies can often be represented by a series of two by two tables. Eachstratum of this series represents observations taken at different times,different places or another system of sub-grouping within one large study.
A pooled odds ratio for allstrata can be calculated by the method of Mantel and Haenszelor that of Woolf . The Mantel-Haenszelmethod is more robust when some of the strata contain small frequencies.
Results are given for individualtables and for the combined statistics ( Haldane corrected), including chi-square forheterogeneity between the tables.
DATA INPUT:
Observed frequencies should beentered as multiple fourfold tables:
feature present | feature absent | |
outcome positive | a | b |
outcome negative | c | d |
Example
From Armitage and Berry(1994, p. 516).
The following data compare thesmoking status of lung cancer patients with controls. Ten different studies arecombined in an attempt to improve the overall estimate of relative risk. Thematching of controls has been ignored because there was not enough informationabout matching from each study to be sure that the matching was the same ineach study.
Lung cancer | Controls | ||
smoker | non-smoker | smoker | non-smoker |
83 | 3 | 72 | 14 |
90 | 3 | 227 | 43 |
129 | 7 | 81 | 19 |
412 | 32 | 299 | 131 |
1350 | 7 | 1296 | 61 |
60 | 3 | 106 | 27 |
459 | 18 | 534 | 81 |
499 | 19 | 462 | 56 |
451 | 39 | 1729 | 636 |
260 | 5 | 259 | 28 |
To analysethese data in StatsDirect you must select the Woolffunction from the chi-square section of the analysis menu. Then enter each rowof the table above as a separate 2 by 2 contingency table:
i.e. The first row is entered as:
Smkr | Non | |
Lung cancer | 83 | 3 |
Control | 72 | 14 |
... thisis then repeated for each of the ten rows.
For this example:
Statistics from combined values withoutHaldane correction:
Odds ratio = 4.519207
Approximate 95% CI = 3.752994 to5.441851
Chi² for E(LOR)= 0 is 253.2108, P < 0.0001
Chi² for Heterogeneity =6.634122, P = 0.6752
Statistics from combined valueswith Haldane correction:
Odds ratio = 4.510211
Approximate 95% CI = 3.747642 to5.427948
Chi² for E(LOR)= 0 is 254.0865, P < 0.0001
Chi² for Heterogeneity =6.532662, P = 0.6856
Here we can say that there was noconvincing evidence of heterogeneity between the separate estimates of relativerisk from each of the different studies. The pooled estimate suggested thatwith 95% confidence that the true population odds for being a smoker werebetween 3.7 and 5.4 times greater in lung cancer patients compared withcontrols.
The equivalent analysis using theMantel-Haenszel method gave a confidence interval forthe pooled odds ratio of 3.9 to 5.6; the difference is partly accounted for bythe Haldane correction. You should use the more robust Mantel-Haenszel for most analyses of this kind. Woolf's method isincluded for further investigation of inter-table relationships under expertstatistical guidance.
P values
confidenceintervals
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Non-parametric_Chi-Square Goodness ofFit.
This function enables you tocompare the distribution of classes of observations with an expecteddistribution.
Your data must consist of arandom sample of independent observations, the expected distribution of whichis specified (Armitageand Berry, 1994; Conover, 1999).
Pearson's chi-square goodness offit test statistic is:
- where Oj are observed counts, Ej arecorresponding expected count and c is the number of classes for whichcounts/frequencies are being analysed.
The test statistic is distributedapproximately as a chi-square random variable with c-1 degrees of freedom. Thetest has relatively low power (chance of detecting a real effect) with all butlarge numbers or big deviations from the null hypothesis (all classes containobservations that could have been in those classes by chance).
The handling of small expectedfrequencies is controversial. Koehler and Larnz(1980) assert that the chi-square approximation is adequate provided all ofthe following are true:
·total of observed counts (N) ³ 10
·number of classes(c) ³ 3
·all expected values³0.25
Some statistical software offersexact methods for dealing with small frequencies but these methods are notappropriate for all expected distributions, hence they can be specious. You cantry reducing the number of classes but expert statistical guidance is advisablefor this (Conover,1999).
Example
Suppose we suspected an unusualdistribution of blood groups in patients undergoing one type of surgicalprocedure. We know that the expected distribution for the population served bythe hospital which performs this surgery is 44% group O, 45% group A, 8% groupB and 3% group AB. We can take a random sample of routine pre-operative bloodgrouping results and compare these with the expected distribution.
Results for 187 consecutivepatients:
Blood Group | O | 67 |
A | 83 | |
B | 29 | |
AB | 8 |
To analysethese data using StatsDirect you must first enter theobserved frequencies into the workbook. You can enter the grouped frequencies,as above, or the individual observations (187 rows coded 1 to 4 in this case). If you enter individualobservations, StatsDirect collects them intogroups/bins/classes of frequencies which you can inspect before proceeding withthe analysis. The next step is to enter the expected frequencies, this is donedirectly on screen after you have selectedthe observed frequencies and chosen Chi-square Goodness of Fit from theNon-parametric section of the analysis menu. For this example you can enter theexpected proportions, the expected frequencies will be calculated and displayedautomatically. You can also alter the number of degrees of freedom but this isintended for expert statistical use, thus you would normally exceptthe default value of number of categories minus one. The results for ourexample are:
N = 187
Value | Observed frequency | Expected frequency |
1 | 67 | 82.28 |
2 | 83 | 84.15 |
3 | 29 | 14.96 |
4 | 8 | 5.61 |
Chi-square = 17.0481 df = 3
P = .0007
Here we may report astatistically highly significant difference between the distribution of bloodgroups from patients undergoing this surgical procedure and that which would beexpected from a random sample of the general population.
P values
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Non-parametric_Chi-Square Goodness ofFit.
This function enables you tocompare the distribution of classes of observations with an expecteddistribution.
Your data must consist of arandom sample of independent observations, the expected distribution of whichis specified (Armitageand Berry, 1994; Conover, 1999).
Pearson's chi-square goodness offit test statistic is:
- where Oj are observed counts, Ej arecorresponding expected count and c is the number of classes for whichcounts/frequencies are being analysed.
The test statistic is distributedapproximately as a chi-square random variable with c-1 degrees of freedom. Thetest has relatively low power (chance of detecting a real effect) with all butlarge numbers or big deviations from the null hypothesis (all classes containobservations that could have been in those classes by chance).
The handling of small expectedfrequencies is controversial. Koehler and Larnz(1980) assert that the chi-square approximation is adequate provided all ofthe following are true:
·total of observed counts (N) ³ 10
·number of classes(c) ³ 3
·all expected values³0.25
Some statistical software offersexact methods for dealing with small frequencies but these methods are notappropriate for all expected distributions, hence they can be specious. You cantry reducing the number of classes but expert statistical guidance is advisablefor this (Conover,1999).
Example
Suppose we suspected an unusualdistribution of blood groups in patients undergoing one type of surgicalprocedure. We know that the expected distribution for the population served bythe hospital which performs this surgery is 44% group O, 45% group A, 8% groupB and 3% group AB. We can take a random sample of routine pre-operative bloodgrouping results and compare these with the expected distribution.
Results for 187 consecutivepatients:
Blood Group | O | 67 |
A | 83 | |
B | 29 | |
AB | 8 |
To analysethese data using StatsDirect you must first enter theobserved frequencies into the workbook. You can enter the grouped frequencies,as above, or the individual observations (187 rows coded 1 to 4 in this case). If you enter individualobservations, StatsDirect collects them intogroups/bins/classes of frequencies which you can inspect before proceeding withthe analysis. The next step is to enter the expected frequencies, this is donedirectly on screen after you have selectedthe observed frequencies and chosen Chi-square Goodness of Fit from theNon-parametric section of the analysis menu. For this example you can enter theexpected proportions, the expected frequencies will be calculated and displayedautomatically. You can also alter the number of degrees of freedom but this isintended for expert statistical use, thus you would normally exceptthe default value of number of categories minus one. The results for ourexample are:
N = 187
Value | Observed frequency | Expected frequency |
1 | 67 | 82.28 |
2 | 83 | 84.15 |
3 | 29 | 14.96 |
4 | 8 | 5.61 |
Chi-square = 17.0481 df = 3
P = .0007
Here we may report astatistically highly significant difference between the distribution of bloodgroups from patients undergoing this surgical procedure and that which would beexpected from a random sample of the general population.
P values