Hodges-Lehmann Estimate (Independent-Samples)

Introduction

The Hodges-Lehmann estimate, is the median of all the possible differences between two sets of data. The authors (Hodges & Lehmann, 1963) describe it as the location shift that is needed to align two distributions (with similar distributions) as much as possible (p. 599).

It is sometimes incorrectly described as the difference between the two medians, but that is incorrect. It is not uncommon to have a different Hodges-Lehmann estimate than simply taking the difference between the two medians.

This measure is sometimes mentioned as an effect size measure for a Mann-Whitney U / Wilcoxon Rank Sum test (van Geloven, 2018), however since it is a median of the possible differences, it is not standardized (i.e. it doesn't range between two fixed values, and depends therefor on the data).

A separate page is available for the one-sample version of the Hodges-Lehmann Estimate

Obtaining the Measure

with Python

Jupyter Notebook: ES - Hodges-Lehmann (ind sam) (P).ipynb

with stikpetP

To Be Made

without stikpetP

To Be Made

Formulas

The formula for the Hodges-Lehmann estimator with two samples is (Hodges & Lehmann, 1963, p. 602):

\(HL = \text{median}\left(y_j - x_i | 1 \leq i \leq n_x, 1 \leq j \leq n_y\right)\)

Symbols used:

\(x_i\), the i-th score in category x
\(x_j\), the j-th score in category y
\(n_i\), the number of scores in category i

There might be a faster method to actually determine this. Algorithm 616 (Monahan, 1984), but couldn't translate the Fortran code to R or Python.

Worked out Example

Lets say we have the scores 1, 3, 7, 8 in one category x, and the scores 2, 5, 8, 8, 9 in a second category y.

We then need to determine the difference for each possible pair:

\(y-x\)	y
x	2	5	8	8	9
1	2 - 1 = 1	5 - 1 = 4	8 - 1 = 7	8 - 1 = 7	9 - 1 = 8
3	2 - 3 = -1	5 - 3 = 2	8 - 3 = 5	8 - 3 = 5	9 - 3 = 6
7	2 - 7 = -5	5 - 7 = -2	8 - 7 = 1	8 - 7 = 1	9 - 7 = 2
8	2 - 8 = -6	5 - 8 = -3	8 - 8 = 0	8 - 8 = 0	9 - 8 = 1

The list of all differences between every possible pair is then:

\(-1,-4,-3,-3,-3,1,-2,-5,-5,-6,5,2,-1,-1,-2,6,3,0,0,-1\)

We can sort this list to get:

\(-6,-5,-5,-4,-3,-3,-3,-2,-2,-1,-1,-1,-1,0,0,1,2,3,5,6\)

Next we need to determine the median of all these differences. There are 20 scores in total ( \(4\times 5\) ), so the median will be the \(\frac{20+1}{2} = 10.5\)-th score.

The 10th score is -1 and also the 11th, so the 10.5 will also be -1 (otherwise we would have averaged the two).

The Hodges-Lehmann is therefore -1.

Interpretation

Hodges-Lehmann Estimate is the median of all possible differences. It could therefor range between minus and plus infinity. It shows that if you take any pair of values (one from each of the two categories), that in 50% of the cases the differences between the two will be the Hodges-Lehmann value or more.

Unfortunately I have not been able to find any rules-of-thumb for this measure.

Links to parts

Google adds