Welch(-Satterthwaite) t-Test for Independent Samples

Explanation

The Welch(-Satterthwaite) t-test can be used to compare two means. So for example if on average national students score different from international students on an exam. Unlike the more popular Student t-test, this test does not require the variances to be equal.

The test seems to have been independently developed by Welch (1938; 1947) and Sattertwaithe (1946). Although Welch (1947, p. 32, eq. 29) also proposed an alternative formula for the degrees of freedom, in Aspin and Welch (1949, p. 295) indicate this alternative version has little to know advantage over the other version

Performing the Test

with Excel

Excel file: TS - Welch-t (ind samples) (E).xlsm

with stikpetE

To Be Made

without stikpetE

To Be Made

with Python

Jupyter Notebook: TS -Student and Welch t (ind samples) (P).ipynb

with stikpetP

To Be Made

without stikpetP

with R

Jupyter Notebook: TS -Student and Welch t (ind samples) (R).ipynb

with stikpetR

To Be Made

without stikpetR

with SPSS

Formulas

The formula:

\(t = \frac{\bar{x}_1 - \bar{x}_2}{SE}\)

\(df = \frac{SE^4}{\frac{\left(s_1^2\right)^2}{n_1^2\times\left(n_1 - 1\right)} + \frac{\left(s_2^2\right)^2}{n_2^2\times\left(n_2 - 1\right)}}\)

\(sig. = 2\times\left(1 - T\left(\left|t\right|, df\right)\right)\)

With:

\(SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)

\(s_i^2 = \frac{\sum_{j=1}^{n_i} \left(x_{i,j} - \bar{x}_i\right)^2}{n_i - 1}\)

\(\bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i}\)

Symbols used:

\(x_{i,j}\), the j-th score in category i
\(n_i\), the number of scores in category i
\(T\left(\dots\right)\), the cumulative distribution function of the t-distribution

The degrees of freedom can be found in Welch (1938, p. 353, eq. 9; 1947, p. 32, eq. 28) and Satterthwaite (1946, p. 114, eq. 17). Welch (1947, p. 32, eq. 29) also suggests another formula for the degrees of freedom, which can be written as:

\(df_w = \frac{\left( \frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{s_1^2}{n_1^2 \times \left(n_1 + 1\right)} + \frac{s_2^2}{n_2^2 \times \left(n_2 + 1\right)}} - 2\)

Interpreting the Result

The assumption about the population for this test (the null hypothesis) is that the means are equal for the two samples.

The test provides a p-value, which is the probability of a test statistic as from the sample, or even more extreme, if the assumption about the population would be true. If this p-value (significance) is below a pre-defined threshold (the significance level \(\alpha\) ), the assumption about the population is rejected. We then speak of a (statistically) significant result. The threshold is usually set at 0.05. Anything below is then considered low.

If the assumption is rejected, we conclude that the means in the population will be different.

Note that if we do not reject the assumption, it does not mean we accept it, we simply state that there is insufficient evidence to reject it.

Writing the results

Writing up the results of the test uses the format (APA, 2019 p. 182):

t(<degrees of freedom.>) = <t-value>, p = <p-value>

So for example:

The mean grade of the national students was 59.64 (\(n_{n}\) = 30), while for the international it was 53.73 (\(n_{i}\) = 11). Using a Welch t-test, there was no significant difference, t(14.16) = 0.694, p < .499.

The p-value is shown with three decimal places, and no 0 before the decimal sign. If the p-value is below .0005, it can be reported as p < .001.

APA (2019, p. 88) states to also report an effect size measure.

Next...

After this test you might want an effect size measure. Various options are available for this: Common Language, Cohen d_s, Cohen U, Hedges g, Glass delta, biserial correlation, point-biserial correlation

Alternatives

for the two independent samples, the following tests could be considered:

Table 1
Independent samples mean tests
test	equal variance assumption	normality assumption
Student	yes	yes
Welch	no	yes
Trimmed	yes	no
Yuen-Welch	no	no

Links to parts

Google adds