C-Square Test
Explanation
The \(C^2\) test, is a test for stochastic-equivelance. This means that even if the medians are equal between two independent samples, this test could be significant.
Lets say we have one group A that scored 1, 2, 2, 5, 6, 6, 7, and another group B that scored 4, 4, 4, 5, 10, 10, 12. Each group has the same median (i.e. 5), and are symmetric around the median, but if a high score is positive, most people would rather be in group B than in group A. This is where ‘stochastic equality’ comes in. It looks at the chance if you pick a random person from group A and B each, the one from group A scores lower than the one from group B, and add half the chance that their equal. In this example that’s about 0.68.
This test is an improvement of the Brunner-Munzel test, but like that test, Schüürhuis et al. (2025, p. 18) still advice using a studentized permutation test if any of the two samples has less than 15 scores.
Performing the Test
with Excel
Excel file: To Be Made
with stikpetE
without stikpetE
with R
Jupyter Notebook: To Be Made
with stikpetR
without stikpetR
with SPSS
To Be Made
Formulas
The test-statistic is calculated using (Schüürhuis et al., 2025, p. 9, eq. 17):
\(C^2 = \frac{4}{\hat{\sigma}_N^2} \times \hat{\theta}_N\times\left(1 - \hat{\theta}_N\right) \times \left(\hat{\theta}_N - \frac{1}{2}\right)^2\)
The p-value can then be calculated using (Schüürhuis et al., 2025, p. 9 eq. 17):
\(p = 1 - \chi^2\left(C^2, 1\right)\)
The total estimated variance (Schüürhuis et al., 2025, p. 5, eq. 5):
\(\hat{\sigma}_N^2 = \frac{1}{d_n}\times\left(\left(\sum_{k=1}^2 SS_k^*\right) - n_1 \times n_2 \times \left(\hat{\theta}_N\times\left(1 - \hat{\theta}_N\right) - \frac{\hat{\tau}_N}{4}\right)\right)\)
and (Schüürhuis et al., 2025, p. 4, eq. 1):
\(\hat{\theta} = \frac{1}{n_1} \times \left(\bar{R}_2 - \frac{n_2 + 1}{2}\right)\)
estimate for the probability of ties in the overlap (Schüürhuis et al., 2025, p. 5, eq. 4):
\(\bar{R}_k = \frac{\sum_{i=1}^{n_k} R_{ik}}{n_k}\)
the sum of the squared deviations from the placement values:
\(SS_k^* = \sum_{i=1}^{n_k} \left(R_{ik}^* - \bar{R}_{k}^*\right)^2\)
mean of the placement values (Schüürhuis et al., 2025, p. 5):
\(\bar{R}_{k}^* = \frac{\sum_{i=1}^{n_k} R_{ik}^*}{n_k}\)
the placement values:
\(R_{ik}^* = R_{ik} - R_{ik}^{(k)}\)
means of different ranks:
\(\bar{R}_2 = \frac{\sum_{i=1}^{n_2} R_{i2}}{n_2}, \bar{R}_2^+ = \frac{\sum_{i=1}^{n_2} R_{i2}^+}{n_2}, \bar{R}_2^{(2)+} = \frac{\sum_{i=1}^{n_2} R_{i2}^{(2)+}}{n_2}, \bar{R}_2^{(2)-} = \frac{\sum_{i=1}^{n_2} R_{i2}^{(2)-}}{n_2}\)
Symbols used:
- \(R_{ik}^-\), the min-rank of the i-th score in category k, when using all combined scores
- \(R_{ik}^{(k)-}\), the min-rank of the i-th score in category k, when using only scores from category k
- \(R_{ik}\), the mid-rank of the i-th score in category k, when using all combined scores
- \(R_{ik}^{(k)}\), the mid-rank of the i-th score in category k, when using only scores from category k
- \(R_{ik}^+\), the max-rank of the i-th score in category k, when using all combined scores
- \(R_{ik}^{(k)+}\), the max-rank of the i-th score in category k, when using only scores from category k
- \(N\), the total sample size
- \(n_{k}\), the number of scores in category k
- \(\chi\left(\dots\right)^2\), the cumulative distribution function of the \(\chi^2\)-distribution
Interpreting the Result
The assumption about the population for this test (the null hypothesis) is that the two samples are stochastically equivelant.
The test provides a p-value, which is the probability of a test statistic as from the sample, or even more extreme, if the assumption about the population would be true. If this p-value (significance) is below a pre-defined threshold (the significance level \(\alpha\) ), the assumption about the population is rejected. We then speak of a (statistically) significant result. The threshold is usually set at 0.05. Anything below is then considered low.
If the assumption is rejected, we conclude that the two samples are not stochastically equal. This indicates the scores in one of the two are 'higher' than in the other.
Note that if we do not reject the assumption, it does not mean we accept it, we simply state that there is insufficient evidence to reject it.
Writing the results
Writing up the results of the test uses the format (APA, 2019 p. 182):
\(\chi^2\)(<degrees of freedom>) = <\(C^2\)-value>, p = <p-value>
So for example:
A \(C^2\) test indicated a significant difference between males and females in the distribution of scores, t(36.98) = 4.466, p < .001.
A few notes about reporting statistical results with APA:
- The p-value is shown with three decimal places, and no 0 before the decimal sign. If the p-value is below .0005, it can be reported as p < .001.
- t is a standard abbreviations from APA for the t-distribution (see APA, 2019, table 6.5).
- APA does not require to include references nor formulas for statical analysis that are in common use (2019, p. 181).
- APA (2019, p. 88) states to also report an effect size measure.
Next...
The next step is to determine an effect size measure. Varha-Delaney A, a Rosenthal Correlation, or a (Glass) Rank Biserial Correlations (Cliff Delta), could be suitable for this.
Alternatives
alternatives for testing stochastic equivelance:
- Mann-Whitney U. Chung and Romano (2011, p. 5) note that it fails to control type 1 errors
- the Brunner-Munzel test
- the Brunner-Munzel studentized permutation test
- Cliff-Delta, which according to Delaney and Vargha (2002), performs similar as the Brunner-Munzel test.
if you only want to test if the medians are equal:
- Mann-Whitney U, assuming distributions have the same shape
- Fligner-Policello, assuming distributions are symmetric around the median, and continuous data
- Mood-Median, although according to Schlag (2015) this is actually testing quantiles, and can lead to over rejection.
- Schlag, but only used to accept or reject, no p-value
Google adds