Critical issues with the Pearson's chi-square test
Abstrak
Pearson's chi-square tests are among the most commonly applied statistical tools across a wide range of scientific disciplines, including medicine, engineering, biology, sociology, marketing and business. However, its usage in some areas is not correct. For example, the chi-square test for homogeneity of proportions (that is, comparing proportions across groups in a contingency table) is frequently used to verify if the rows of a given nonnegative $m \times n$ (contingency) matrix $A$ are proportional. The null-hypothesis $H_0$: ``$m$ rows are proportional'' (for the whole population) is rejected with confidence level $1 - α$ if and only if $χ^2_{stat} > χ^2_{crit}$, where the first term is given by Pearson's formula, while the second one depends only on $m, n$, and $α$, but not on the entries of $A$. It is immediate to notice that the Pearson's formula is not invariant. More precisely, whenever we multiply all entries of $A$ by a constant $c$, the value $χ^2_{stat}(A)$ is multiplied by $c$, too, $χ^2_{stat}(cA) = c χ^2_{stat} (A)$. Thus, if all rows of $A$ are exactly proportional then $χ^2_{stat}(cA) = c χ^2_{stat}(A) = 0$ for any $c$ and any $α$. Otherwise, $χ^2_{stat} (cA)$ becomes arbitrary large or small, as positive $c$ is increasing or decreasing. Hence, at any fixed significance level $α$, the null hypothesis $H_0$ will be rejected with confidence $1 - α$, when $c$ is sufficiently large and not rejected when $c$ is sufficiently small, Yet, obviously, the rows of $cA$ should be proportional or not for all $c$ simultaneously. Thus, any reasonable formula for the test statistic must be invariant, that is, take the same value for matrices $cA$ for all real positive $c$. KEY WORDS: Pearson chi-square test, difference between two proportions, goodness of fit, contingency tables.
Topik & Kata Kunci
Penulis (2)
Vladimir Gurvich
Mariya Naumova
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓