Common Language Effect Size / Probability of Superiority / Vargha-Delaney A

Introduction

The Common Language Effect Size is the probability when taking a random pair from two categories, the score from the first category is greater than the second. Note however that Wolfe and Hogg (1971) actually had this in reverse.

It could be used as an effect size measure for an independent samples test (e.g. a Mann-Whitney U test, t-test, etc.).

Some will also argue to count ties equally to each of the two categories (Grissom, 1994, p. 282), which is often the case with ordinal data. Vargha-Delaney A (2000) uses the sum of ranks, but eventually will then be equal to the CLES, where half the ties are added to one, and the other half to the other category.

McGraw and Wong (1992) also provide a method for continuous data, that uses the normal distribution to determine the CLES.

The term Common Language Effect Size can be found in McGraw and Wong (1992), the term Probability of Superiority is found in Grissom (1994), and the term Stochastic Superiority in Vargha and Delaney (2000)

Although usually this measure is used with independent or sometimes paired samples, others (Ben-Shachar et al., 2020; Tulimieri, 2021) adapted this also for use in one-sample cases. In this case it is about the probability a random sample is greater than the hypothesized value.

Obtaining the Effect Size

For One-Sample

with Excel

Excel file from video: ES - Common Language (One-Sample) (E).xlsm

with stikpetE

without stikpetE

with Python

Jupyter Notebook: ES - Common Language (One-Sample) (P).ipynb

with stikpetP

without stikpetP

with R (Studio)

Jupyter Notebook: ES - Common Language (One-Sample) (R).ipynb

with stikpetR

without stikpetR

with SPSS

Unfortunately, I'm not aware on how to do this with SPSS using the GUI, but it is possible to trick SPSS into this.

Formulas

If the CLES is defined as the probability that a random sample from the sample is greater than the hypothesized value:

\(CLES_1 = P\left(x \gt \mu\right)\)

with:

\(P\left(x \gt \mu\right) = \frac{\sum_{i=1}^n \begin{cases} 1, & \text{if } x_i \gt \mu \\ 0, & \text{otherwise}\end{cases}}{n}\)

Adding half the ties would give (Tulimieri, 2021):

\(CLES_2 = P\left(x \gt \mu\right) + \frac{1}{2}\times P\left(x = \mu\right)\)

with:

\(P\left(x = \mu\right) = \frac{\sum_{i=1}^n \begin{cases} 1, & \text{if } x_i = \mu \\ 0, & \text{otherwise}\end{cases}}{n}\)

This seems to also produce the same result as what Mangiafico (2016, pp. 223–224) calls a VDA-like measure, where VDA is short for Vargha-Delaney A.

A third formula uses the rank-biserial correlation coefficient (\(r_{rb}\)) (Ben-Shachar et al., 2020):

\(CLES_3 = \frac{1 + r_{rb}}{2}\)

A fourth converts Cohen's d' to a CLES (Ben-Shachar et al., 2020):

\(CLES_4 = \Phi\left(\frac{d'}{\sqrt{2}}\right)\)

Where \(\Phi\left(\dots\right)\) is the cumulative distribution function from the standard normal distribution.

For Independent Samples

with Excel

Excel file: ES - Common Language (Independent-Samples) (E).xlsm

with stikpetE add-in

Video To Be Made

without stikpetE add-in

Video To Be Made

with Python

Jupyter Notebook: ES - Common Language (Ind Samples) (P).ipynb

with stikpetP library

Video To Be Made

without stikpetP library

Video To Be Made

with R (Studio)

Jupyter Notebook: ES - Common Language (Ind Samples) (R).ipynb

with stikpetR library

Video To Be Made

without stikpetR library

Video To Be Made

with SPSS

Unfortunately, I'm not aware on how to do this with SPSS using the GUI, but it is possible to trick SPSS into this.

Formulas

The 'brute-force' formula for the CLES would be:

CLES=\(P(X > Y) = \frac{\sum_{i=1}^{n_1} \sum_{j=1}^{n_2} \mathbb{I}(x_i > y_j)}{n_1 \times n_2}\)

with \(\mathbb{I}()\) being the indicator function, i.e.

\(\mathbb{I}(x_i > y_j) = \begin{cases} 1, & \text{if } x_i > y_j \\0, & \text{otherwise}\end{cases}\)

The \(n_i\) is the number of scores in the i-th category, and \(x_i, y_j\) the i-th and j-th score in category 1 and category 2 resp.

Adding half the ties, as for the Vargha-Delaney A, and also proposed by Grissom (1994, p. 282):

CLES=\(P(X > Y) + \frac{P(X = Y)}{2}= \frac{\sum_{i=1}^{n_1} \sum_{j=1}^{n_2} \mathbb{I}(x_i > y_j)}{n_1 \times n_2} + \frac{1}{2}\times\frac{\sum_{i=1}^{n_1} \sum_{j=1}^{n_2} \mathbb{I}(x_i = y_j)}{n_1 \times n_2}\)

Alternatively Vargha-Delaney A can be determined using the ranks (Vargha & Delaney, 2000, p. 107):

\(A_i = \frac{1}{n_j}\times\left(\frac{R_i}{n_i} - \frac{n_i + 1}{2}\right)\)

Which is actually the same as:

\(A_i = \frac{U_i}{n_i\times n_j}\)

Where \(R_i\) is the sum of ranks from the i-th category, and \(U_i\) the Mann-Whitney U statistic for category i.

The normal approximation can be obtained using (McGraw & Wong, 1992, p. 361):

\(CLES = \Phi(z)\)

with:

\(z = \frac{|\bar{x}_1 - \bar{x}_2|}{\sqrt{s_1^2 + s_2^2}}\)

\(s_i^2 = \frac{\sum_{j=1}^{n_i} \left(x_{i,j} - \bar{x}_i\right)^2}{n_i - 1}\)

\(\bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i}\)

Symbols used:

\(x_{i,j}\), the j-th score in category i
\(\bar{x}_i\), the arithmetic mean (average) of the scores in category i
\(s_i^2\), the unbiased sample variance of the scores in category i
\(\Phi(z)\), the cumulative probability function for the standard normal distribution

An example is worked out in the following section, using each approach.

Example Brute Force

Lets say we have the scores of 1, 4, 5, 6, 8, and 10 for six national students, and 2, 3, 8, 8, 9 for five international students. If we set national as category 1 (\(x\) ), and international as category 2 (\Y\), we have:

X = (1, 4, 5, 6, 8, 10), Y = (2, 3, 8, 8, 9)

\(n_1 = 6, n_2 = 5\)

We can fill out the formula:

\(P(X > Y) = \frac{\sum_{i=1}^{n_1} \sum_{j=1}^{n_2} \mathbb{I}(x_i > y_j)}{n_1 \times n_2} = \frac{\sum_{i=1}^{6} \sum_{j=1}^{5} \mathbb{I}(x_i > y_j)}{6 \times 5} = \frac{\sum_{i=1}^{6} \mathbb{I}(x_i > y_1)+\mathbb{I}(x_i > y_2)+\mathbb{I}(x_i > y_3)+\mathbb{I}(x_i > y_4)+\mathbb{I}(x_i > y_5)}{30}\)

The numerator actually simply checks every possible pair, if the first category score is higher than the second a 1, otherwise a 0:

\(\mathbb{I}(x_i > y_j)\)	y
x	2	3	8	8	9
1	\(\mathbb{I}(1 > 2) = 0\)	\(\mathbb{I}(1 > 3) = 0\)	\(\mathbb{I}(1 > 8) = 0\)	\(\mathbb{I}(1 > 8) = 0\)	\(\mathbb{I}(1 > 9) = 0\)
4	\(\mathbb{I}(4 > 2) = 1\)	\(\mathbb{I}(4 > 3) = 1\)	\(\mathbb{I}(4 > 8) = 0\)	\(\mathbb{I}(4 > 8) = 0\)	\(\mathbb{I}(4 > 9) = 0\)
5	\(\mathbb{I}(5 > 2) = 1\)	\(\mathbb{I}(5 > 3) = 1\)	\(\mathbb{I}(5 > 8) = 0\)	\(\mathbb{I}(5 > 8) = 0\)	\(\mathbb{I}(5 > 9) = 0\)
6	\(\mathbb{I}(6 > 2) = 1\)	\(\mathbb{I}(6 > 3) = 1\)	\(\mathbb{I}(6 > 8) = 0\)	\(\mathbb{I}(6 > 8) = 0\)	\(\mathbb{I}(6 > 9) = 0\)
8	\(\mathbb{I}(8 > 2) = 1\)	\(\mathbb{I}(8 > 3) = 1\)	\(\mathbb{I}(8 > 8) = 0\)	\(\mathbb{I}(8 > 8) = 0\)	\(\mathbb{I}(8 > 9) = 0\)
10	\(\mathbb{I}(10 > 2) = 1\)	\(\mathbb{I}(10 > 3) = 1\)	\(\mathbb{I}(10 > 8) = 1\)	\(\mathbb{I}(10 > 8) = 1\)	\(\mathbb{I}(10 > 9) = 1\)

There are 13 combinations where \(x_i > y_j\), so we get:

\(P(X > Y) = \frac{13}{30}\)

You might notice there are two combinations where \(x_i = y_j\). If we distribute these evenly over the two categories we get:

\(P(X > Y) + \frac{P(X = Y)}{2} = \frac{13}{30} + \frac{1}{2}\times\frac{2}{30}= \frac{13 + 1}{30} = \frac{7}{15} \approx 0.4667\)

Example Vargha-Delaney A

X = (1, 4, 5, 6, 8, 10), Y = (2, 3, 8, 8, 9)

\(n_1 = 6, n_2 = 5\)

We combine all scores and determine the ranks:

Z = X + Y = (1, 4, 5, 6, 8, 10) + (2, 3, 8, 8, 9) = (1, 4, 5, 6, 8, 10, 2, 3, 8, 8, 9) =

Now to determine the ranks, lets sort them:

\(Z_s = (1, 2, 3, 4, 5, 6, 8, 8, 8, 9, 10)\)

Now ranking these, averaging ties gives as ranks:

\(r = (1, 2, 3, 4, 5, 6, 8, 8, 8, 10, 11)\)

Note that the three 8's would be ranks 7, 8 and 9, so averaged to 8.

The ranks for the national students are then:

\(r_x = (1, 4, 5, 6, 8, 11)\)

The sum of ranks for the national students:

\(R_x = 1 + 4 + 5 + 6 + 8 + 11 = 35\)

Completing the formula gives:

\(A_i = \frac{1}{n_j}\times\left(\frac{R_i}{n_i} - \frac{n_i + 1}{2}\right) = \frac{1}{5}\times\left(\frac{35}{6} - \frac{6 + 1}{2}\right) = \frac{1}{5}\times\left(\frac{35}{6} - \frac{21}{6}\right) = \frac{1}{5}\times\left(\frac{14}{6}\right) = \frac{14}{30} = \frac{7}{15} \approx 0.4667\)

Example Using Normal Distribution

Lets say we have the scores of 1, 4, 5, 6, 8, and 10 for six national students, and 2, 3, 8, 8, 9 for five international students. If we set national as category 1 (\(x_1\) ), and international as category 2 (\x_2\), we have:

X_1 = (1, 4, 5, 6, 8, 10), X_2 = (2, 3, 8, 8, 9)

\(n_1 = 6, n_2 = 5\)

First the means:

\(\bar{x}_1 = \frac{\sum_{j=1}^{6} x_{i,j}}{6} = \frac{x_{1,1} + x_{1,2} + ... + x_{1,6}}{6} = \frac{1+4+5+6+8+10}{6} = \frac{34}{6} = \frac{17}{3}\)

\(\bar{x}_2 = \frac{\sum_{j=1}^{5} x_{i,j}}{5} = \frac{x_{1,1} + x_{1,2} + ... + x_{1,5}}{5} = \frac{2+3+8+8+9}{5} = \frac{30}{5} = 6\)

Then the variances:

\(s_1^2 = \frac{\sum_{j=1}^{6} \left(x_{1,j} - \bar{x}_1\right)^2}{6 - 1} = \frac{\sum_{j=1}^{6} \left(x_{1,j} - \frac{17}{3}\right)^2}{5} = \frac{\left(1 - \frac{17}{3}\right)^2 + \left(4 - \frac{17}{3}\right)^2 + \left(5 - \frac{17}{3}\right)^2 + \left(6 - \frac{17}{3}\right)^2 + \left(8 - \frac{17}{3}\right)^2 + \left(10 - \frac{17}{3}\right)^2}{5}\)

\( = \frac{\left(\frac{3}{3} - \frac{17}{3}\right)^2 + \left(\frac{12}{3} - \frac{17}{3}\right)^2 + \left(\frac{15}{3} - \frac{17}{3}\right)^2 + \left(\frac{18}{3} - \frac{17}{3}\right)^2 + \left(\frac{24}{3} - \frac{17}{3}\right)^2 + \left(\frac{30}{3} - \frac{17}{3}\right)^2}{5}\)

\( = \frac{\left(-\frac{14}{3}\right)^2 + \left(-\frac{5}{3}\right)^2 + \left(-\frac{2}{3}\right)^2 + \left(\frac{1}{3}\right)^2 + \left(\frac{7}{3}\right)^2 + \left(\frac{13}{3}\right)^2}{5} = \frac{\frac{(-14)^2}{3^2} + \frac{(-5)^2}{3^2} + \frac{(-2)^2}{3^2} + \frac{1^2}{3^2} + \frac{7^2}{3^2} + \frac{13^2}{3^2}}{5}\)

\( = \frac{\frac{196}{9} + \frac{25}{9} + \frac{4}{9} + \frac{1}{9} + \frac{49}{9} + \frac{169}{9}}{5} = \frac{\frac{196+25+4+1+49+169}{9}}{5} = \frac{\frac{444}{9}}{5} = \frac{\frac{148}{3}}{5} = \frac{148}{15}\)

\(s_2^2 = \frac{\sum_{j=1}^{5} \left(x_{2,j} - \bar{x}_2\right)^2}{5 - 1} = \frac{\sum_{j=1}^{5} \left(x_{2,j} - 6\right)^2}{4} = \frac{\left(2 - 6\right)^2 + \left(3 - 6\right)^2 + \left(8 - 6\right)^2 + \left(8 - 6\right)^2 + \left(9 - 6\right)^2}{4}\)

\( = \frac{\left(-4\right)^2 + \left(-3\right)^2 + \left(2\right)^2 + \left(2\right)^2 + \left(3\right)^2}{4} = \frac{16 + 9 + 4 + 4 + 9}{4} = \frac{42}{4} = \frac{21}{2}\)

Now the z-value

\(z = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_1^2 + s_2^2}} = \frac{\frac{17}{3} - 6}{\sqrt{\frac{148}{15}+ \frac{21}{2}}} = \frac{\frac{17}{3} - \frac{18}{3}}{\sqrt{\frac{296}{30}+ \frac{315}{30}}} = \frac{-\frac{1}{3}}{\sqrt{\frac{296+315}{30}}} = \frac{-\frac{1}{3}}{\sqrt{\frac{611}{30}}} = \frac{-\frac{1}{3}}{\frac{1}{30}\sqrt{18330}} = \frac{-30}{3\sqrt{18330}} = -\frac{10}{\sqrt{18330}} \)

Last we find the corresponding probability for this z-value using the standard normal distribution.

\(CLES = \Phi(-\frac{10}{\sqrt{18330}}) \approx 0.4706\)

See the standard normal distribution page in case you are interested on how to determine this last step.

Interpretation

The CLE can range from 0 to 1. A one would indicate all scores in the first category are higher than those in the second, a zero would be the opposite. If things would be in balance, the CLE would be 0.5. We are often then interested how far off the CLE is from 0.5.

Vargha and Delaney (2000, p. 106) proposed the rules-of-thumb shown in Table 1:

Table 1
Rule of thumb for CLE
\|CLE - 0.5\|	Interpretation
0.00 < 0.06	Negligible
0.06 < 0.14	Small
0.14 < 0.21	Medium
0.21 or more	Large

Alternatively, the CLES can be converted to a Rank-Biserial Correlation Coefficient, and this in turn can be converted to Cohen d. Separate rules of thumb are proposed for these measures. The formula for the conversion is not that complicated:

\(r_b = 2\times CLES - 1\)

See the Rank-Biserial page for rules-of-thumb for this measure, and how this in turn can be converted to Cohen d.

Links to parts

Google adds