Welch(-Satterthwaite) t-Test for Independent Samples
Explanation
The Welch(-Satterthwaite) t-test can be used to compare two means. So for example if on average national students score different from international students on an exam. Unlike the more popular Student t-test, this test does not require the variances to be equal.
The test seems to have been independently developed by Welch (1938; 1947) and Sattertwaithe (1946). Although Welch (1947, p. 32, eq. 29) also proposed an alternative formula for the degrees of freedom, in Aspin and Welch (1949, p. 295) indicate this alternative version has little to know advantage over the other version
Performing the Test
with Excel
Excel file: TS - Welch-t (ind samples) (E).xlsm
with stikpetE
To Be Made
without stikpetE
To Be Made
with Python
Jupyter Notebook: TS -Student and Welch t (ind samples) (P).ipynb
with stikpetP
To Be Made
without stikpetP
with R
Jupyter Notebook: TS -Student and Welch t (ind samples) (R).ipynb
with stikpetR
To Be Made
without stikpetR
with SPSS
Formulas
The formula:
\(t = \frac{\bar{x}_1 - \bar{x}_2}{SE}\)
\(df = \frac{SE^4}{\frac{\left(s_1^2\right)^2}{n_1^2\times\left(n_1 - 1\right)} + \frac{\left(s_2^2\right)^2}{n_2^2\times\left(n_2 - 1\right)}}\)
\(sig. = 2\times\left(1 - T\left(\left|t\right|, df\right)\right)\)
With:
\(SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)
\(s_i^2 = \frac{\sum_{j=1}^{n_i} \left(x_{i,j} - \bar{x}_i\right)^2}{n_i - 1}\)
\(\bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i}\)
Symbols used:
- \(x_{i,j}\), the j-th score in category i
- \(n_i\), the number of scores in category i
- \(T\left(\dots\right)\), the cumulative distribution function of the t-distribution
The degrees of freedom can be found in Welch (1938, p. 353, eq. 9; 1947, p. 32, eq. 28) and Satterthwaite (1946, p. 114, eq. 17). Welch (1947, p. 32, eq. 29) also suggests another formula for the degrees of freedom, which can be written as:
\(df_w = \frac{\left( \frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{s_1^2}{n_1^2 \times \left(n_1 + 1\right)} + \frac{s_2^2}{n_2^2 \times \left(n_2 + 1\right)}} - 2\)
Interpreting the Result
The assumption about the population for this test (the null hypothesis) is that the means are equal for the two samples.
The test provides a p-value, which is the probability of a test statistic as from the sample, or even more extreme, if the assumption about the population would be true. If this p-value (significance) is below a pre-defined threshold (the significance level \(\alpha\) ), the assumption about the population is rejected. We then speak of a (statistically) significant result. The threshold is usually set at 0.05. Anything below is then considered low.
If the assumption is rejected, we conclude that the means in the population will be different.
Note that if we do not reject the assumption, it does not mean we accept it, we simply state that there is insufficient evidence to reject it.
Writing the results
Writing up the results of the test uses the format (APA, 2019 p. 182):
t(<degrees of freedom.>) = <t-value>, p = <p-value>
So for example:
The mean grade of the national students was 59.64 (\(n_{n}\) = 30), while for the international it was 53.73 (\(n_{i}\) = 11). Using a Welch t-test, there was no significant difference, t(14.16) = 0.694, p < .499.
The p-value is shown with three decimal places, and no 0 before the decimal sign. If the p-value is below .0005, it can be reported as p < .001.
APA (2019, p. 88) states to also report an effect size measure.
Next...
After this test you might want an effect size measure. Various options are available for this: Common Language, Cohen d_s, Cohen U, Hedges g, Glass delta, biserial correlation, point-biserial correlation
Alternatives
for the two independent samples, the following tests could be considered:
test | equal variance assumption | normality assumption |
---|---|---|
Student | yes | yes |
Welch | no | yes |
Trimmed | yes | no |
Yuen-Welch | no | no |
Google adds