C-Square Test

Explanation

The \(C^2\) test, is a test for stochastic-equivelance. This means that even if the medians are equal between two independent samples, this test could be significant.

Lets say we have one group A that scored 1, 2, 2, 5, 6, 6, 7, and another group B that scored 4, 4, 4, 5, 10, 10, 12. Each group has the same median (i.e. 5), and are symmetric around the median, but if a high score is positive, most people would rather be in group B than in group A. This is where ‘stochastic equality’ comes in. It looks at the chance if you pick a random person from group A and B each, the one from group A scores lower than the one from group B, and add half the chance that their equal. In this example that’s about 0.68.

This test is an improvement of the Brunner-Munzel test, but like that test, Schüürhuis et al. (2025, p. 18) still advice using a studentized permutation test if any of the two samples has less than 15 scores.

Performing the Test

with Excel

Excel file: To Be Made

with stikpetE

without stikpetE

with Python

Jupyter Notebook: TS - C-squared (P).ipynb

with stikpetP

without stikpetP

with R

Jupyter Notebook: To Be Made

with stikpetR

without stikpetR

with SPSS

To Be Made

Formulas

The test-statistic is calculated using (Schüürhuis et al., 2025, p. 9, eq. 17):

\(C^2 = \frac{4}{\hat{\sigma}_N^2} \times \hat{\theta}_N\times\left(1 - \hat{\theta}_N\right) \times \left(\hat{\theta}_N - \frac{1}{2}\right)^2\)

The p-value can then be calculated using (Schüürhuis et al., 2025, p. 9 eq. 17):

\(p = 1 - \chi^2\left(C^2, 1\right)\)

The total estimated variance (Schüürhuis et al., 2025, p. 5, eq. 5):

\(\hat{\sigma}_N^2 = \frac{1}{d_n}\times\left(\left(\sum_{k=1}^2 SS_k^*\right) - n_1 \times n_2 \times \left(\hat{\theta}_N\times\left(1 - \hat{\theta}_N\right) - \frac{\hat{\tau}_N}{4}\right)\right)\)

and (Schüürhuis et al., 2025, p. 4, eq. 1):

\(\hat{\theta} = \frac{1}{n_1} \times \left(\bar{R}_2 - \frac{n_2 + 1}{2}\right)\)

estimate for the probability of ties in the overlap (Schüürhuis et al., 2025, p. 5, eq. 4):

\(\bar{R}_k = \frac{\sum_{i=1}^{n_k} R_{ik}}{n_k}\)

the sum of the squared deviations from the placement values:

\(SS_k^* = \sum_{i=1}^{n_k} \left(R_{ik}^* - \bar{R}_{k}^*\right)^2\)

mean of the placement values (Schüürhuis et al., 2025, p. 5):

\(\bar{R}_{k}^* = \frac{\sum_{i=1}^{n_k} R_{ik}^*}{n_k}\)

the placement values:

\(R_{ik}^* = R_{ik} - R_{ik}^{(k)}\)

means of different ranks:

\(\bar{R}_2 = \frac{\sum_{i=1}^{n_2} R_{i2}}{n_2}, \bar{R}_2^+ = \frac{\sum_{i=1}^{n_2} R_{i2}^+}{n_2}, \bar{R}_2^{(2)+} = \frac{\sum_{i=1}^{n_2} R_{i2}^{(2)+}}{n_2}, \bar{R}_2^{(2)-} = \frac{\sum_{i=1}^{n_2} R_{i2}^{(2)-}}{n_2}\)

Symbols used:

\(R_{ik}^-\), the min-rank of the i-th score in category k, when using all combined scores
\(R_{ik}^{(k)-}\), the min-rank of the i-th score in category k, when using only scores from category k
\(R_{ik}\), the mid-rank of the i-th score in category k, when using all combined scores
\(R_{ik}^{(k)}\), the mid-rank of the i-th score in category k, when using only scores from category k
\(R_{ik}^+\), the max-rank of the i-th score in category k, when using all combined scores
\(R_{ik}^{(k)+}\), the max-rank of the i-th score in category k, when using only scores from category k
\(N\), the total sample size
\(n_{k}\), the number of scores in category k
\(\chi\left(\dots\right)^2\), the cumulative distribution function of the \(\chi^2\)-distribution

Interpreting the Result

The assumption about the population for this test (the null hypothesis) is that the two samples are stochastically equivelant.

The test provides a p-value, which is the probability of a test statistic as from the sample, or even more extreme, if the assumption about the population would be true. If this p-value (significance) is below a pre-defined threshold (the significance level \(\alpha\) ), the assumption about the population is rejected. We then speak of a (statistically) significant result. The threshold is usually set at 0.05. Anything below is then considered low.

If the assumption is rejected, we conclude that the two samples are not stochastically equal. This indicates the scores in one of the two are 'higher' than in the other.

Note that if we do not reject the assumption, it does not mean we accept it, we simply state that there is insufficient evidence to reject it.

Writing the results

Writing up the results of the test uses the format (APA, 2019 p. 182):

\(\chi^2\)(<degrees of freedom>) = <\(C^2\)-value>, p = <p-value>

So for example:

A \(C^2\) test indicated a significant difference between males and females in the distribution of scores, t(36.98) = 4.466, p < .001.

A few notes about reporting statistical results with APA:

The p-value is shown with three decimal places, and no 0 before the decimal sign. If the p-value is below .0005, it can be reported as p < .001.
t is a standard abbreviations from APA for the t-distribution (see APA, 2019, table 6.5).
APA does not require to include references nor formulas for statical analysis that are in common use (2019, p. 181).
APA (2019, p. 88) states to also report an effect size measure.

Next...

The next step is to determine an effect size measure. Varha-Delaney A, a Rosenthal Correlation, or a (Glass) Rank Biserial Correlations (Cliff Delta), could be suitable for this.

Alternatives

alternatives for testing stochastic equivelance:

Mann-Whitney U. Chung and Romano (2011, p. 5) note that it fails to control type 1 errors
the Brunner-Munzel test
the Brunner-Munzel studentized permutation test
Cliff-Delta, which according to Delaney and Vargha (2002), performs similar as the Brunner-Munzel test.

if you only want to test if the medians are equal:

Mann-Whitney U, assuming distributions have the same shape
Fligner-Policello, assuming distributions are symmetric around the median, and continuous data
Mood-Median, although according to Schlag (2015) this is actually testing quantiles, and can lead to over rejection.
Schlag, but only used to accept or reject, no p-value

Links to parts

Google adds