Module stikpetP.tests.test_cochran_owa
Expand source code
import pandas as pd
from scipy.stats import chi2
def ts_cochran_owa(nomField, scaleField, categories=None):
'''
Cochran One-Way ANOVA
----------------------
Tests if the means (averages) of each category could be the same in the population.
Note that according to Hartung et al. (2002, p. 225) the Cochran test is the standard test in meta-analysis, but should not be used, since it is always too liberal.
If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.
There are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences.
Parameters
----------
nomField : pandas series
data with categories
scaleField : pandas series
data with the scores
categories : list or dictionary, optional
the categories to use from catField
Returns
-------
Dataframe with:
* *n*, the sample size
* *statistic*, the test statistic (chi-square value)
* *df*, degrees of freedom
* *p-value*, the p-value (significance)
Notes
-----
The formula used is (Cavus & Yazıcı, 2020, p. 5; Hartung et al., 2002, p. 202; Mezui-Mbeng, 2015, p. 787):
$$\\chi_{Cochran}^2 = \\sum_{j=1}^k w_j\\times\\left(\\bar{x}_j - \\bar{y}_w\\right)^2$$
$$df = k - 1$$
$$sig. = 1 - \\chi^2\\left(\\chi_{Cochran}^2, df\\right)$$
With:
$$\\bar{y}_w = \\sum_{j=1}^k h_j\\times\\bar{x}_j$$
$$h_j = \\frac{w_j}{w}$$
$$w = \\sum_{j=1}^k w_j$$
$$w_j = \\frac{n_j}{s_j^2}$$
$$s_j^2 = \\frac{\\sum_{i=1}^{n_j}\\left(x_{i,j} - \\bar{x}_j\\right)^2}{n_j - 1}$$
$$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
*Symbols used:*
* \\(x_{i,j}\\), the i-th score in category j
* \\(k\\), the number of categories
* \\(n_j\\), the sample size of category j
* \\(\\bar{x}_j\\), the sample mean of category j
* \\(s_j^2\\), the sample variance of the scores in category j
* \\(w_j\\), the weight for category j
* \\(h_j\\), the adjusted weight for category j
* \\(df\\), the degrees of freedom.
Couldn’t really find the formula in the original article which is from Cochran (1937),
References
----------
Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. *The R Journal, 12*(2), 134. doi:10.32614/RJ-2021-008
Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. *Supplement to the Journal of the Royal Statistical Society, 4*(1), 102–118. doi:10.2307/2984123
Hartung, J., Argaç, D., & Makambi, K. H. (2002). Small sample properties of tests on homogeneity in one-way anova and meta-analysis. *Statistical Papers, 43*(2), 197–235. doi:10.1007/s00362-002-0097-8
Mezui-Mbeng, P. (2015). A note on Cochran test for homogeneity in two ways ANOVA and meta-analysis. *Open Journal of Statistics, 5*(7), 787–796. doi:10.4236/ojs.2015.57078
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
if type(nomField) == list:
nomField = pd.Series(nomField)
if type(scaleField) == list:
scaleField = pd.Series(scaleField)
data = pd.concat([nomField, scaleField], axis=1)
data.columns = ["category", "score"]
#remove unused categories
if categories is not None:
data = data[data.category.isin(categories)]
#Remove rows with missing values and reset index
data = data.dropna()
data.reset_index()
#overall n, mean and ss
n = len(data["category"])
m = data.score.mean()
sst = data.score.var()*(n-1)
#sample sizes, variances and means per category
nj = data.groupby('category').count()
sj2 = data.groupby('category').var()
mj = data.groupby('category').mean()
#number of categories
k = len(mj)
wj = nj / sj2
w = float(wj.sum())
hj = wj/w
yw = float((hj*mj).sum())
chi2Val = float((wj*(mj - yw)**2).sum())
df = k - 1
pVal = chi2.sf(chi2Val, df)
#results
res = pd.DataFrame([[n, chi2Val, df, pVal]])
res.columns = ["n", "statistic", "df", "p-value"]
return res
Functions
def ts_cochran_owa(nomField, scaleField, categories=None)-
Cochran One-Way ANOVA
Tests if the means (averages) of each category could be the same in the population.
Note that according to Hartung et al. (2002, p. 225) the Cochran test is the standard test in meta-analysis, but should not be used, since it is always too liberal.
If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.
There are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences.
Parameters
nomField:pandas series- data with categories
scaleField:pandas series- data with the scores
categories:listordictionary, optional- the categories to use from catField
Returns
Dataframe with:
- n, the sample size
- statistic, the test statistic (chi-square value)
- df, degrees of freedom
- p-value, the p-value (significance)
Notes
The formula used is (Cavus & Yazıcı, 2020, p. 5; Hartung et al., 2002, p. 202; Mezui-Mbeng, 2015, p. 787): \chi_{Cochran}^2 = \sum_{j=1}^k w_j\times\left(\bar{x}_j - \bar{y}_w\right)^2 df = k - 1 sig. = 1 - \chi^2\left(\chi_{Cochran}^2, df\right)
With: \bar{y}_w = \sum_{j=1}^k h_j\times\bar{x}_j h_j = \frac{w_j}{w} w = \sum_{j=1}^k w_j w_j = \frac{n_j}{s_j^2} s_j^2 = \frac{\sum_{i=1}^{n_j}\left(x_{i,j} - \bar{x}_j\right)^2}{n_j - 1} \bar{x}_j = \frac{\sum_{i=1}^{n_j} x_{i,j}}{n_j}
Symbols used:
- x_{i,j}, the i-th score in category j
- k, the number of categories
- n_j, the sample size of category j
- \bar{x}_j, the sample mean of category j
- s_j^2, the sample variance of the scores in category j
- w_j, the weight for category j
- h_j, the adjusted weight for category j
- df, the degrees of freedom.
Couldn’t really find the formula in the original article which is from Cochran (1937),
References
Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. The R Journal, 12(2), 134. doi:10.32614/RJ-2021-008
Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. Supplement to the Journal of the Royal Statistical Society, 4(1), 102–118. doi:10.2307/2984123
Hartung, J., Argaç, D., & Makambi, K. H. (2002). Small sample properties of tests on homogeneity in one-way anova and meta-analysis. Statistical Papers, 43(2), 197–235. doi:10.1007/s00362-002-0097-8
Mezui-Mbeng, P. (2015). A note on Cochran test for homogeneity in two ways ANOVA and meta-analysis. Open Journal of Statistics, 5(7), 787–796. doi:10.4236/ojs.2015.57078
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ts_cochran_owa(nomField, scaleField, categories=None): ''' Cochran One-Way ANOVA ---------------------- Tests if the means (averages) of each category could be the same in the population. Note that according to Hartung et al. (2002, p. 225) the Cochran test is the standard test in meta-analysis, but should not be used, since it is always too liberal. If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population. There are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences. Parameters ---------- nomField : pandas series data with categories scaleField : pandas series data with the scores categories : list or dictionary, optional the categories to use from catField Returns ------- Dataframe with: * *n*, the sample size * *statistic*, the test statistic (chi-square value) * *df*, degrees of freedom * *p-value*, the p-value (significance) Notes ----- The formula used is (Cavus & Yazıcı, 2020, p. 5; Hartung et al., 2002, p. 202; Mezui-Mbeng, 2015, p. 787): $$\\chi_{Cochran}^2 = \\sum_{j=1}^k w_j\\times\\left(\\bar{x}_j - \\bar{y}_w\\right)^2$$ $$df = k - 1$$ $$sig. = 1 - \\chi^2\\left(\\chi_{Cochran}^2, df\\right)$$ With: $$\\bar{y}_w = \\sum_{j=1}^k h_j\\times\\bar{x}_j$$ $$h_j = \\frac{w_j}{w}$$ $$w = \\sum_{j=1}^k w_j$$ $$w_j = \\frac{n_j}{s_j^2}$$ $$s_j^2 = \\frac{\\sum_{i=1}^{n_j}\\left(x_{i,j} - \\bar{x}_j\\right)^2}{n_j - 1}$$ $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$ *Symbols used:* * \\(x_{i,j}\\), the i-th score in category j * \\(k\\), the number of categories * \\(n_j\\), the sample size of category j * \\(\\bar{x}_j\\), the sample mean of category j * \\(s_j^2\\), the sample variance of the scores in category j * \\(w_j\\), the weight for category j * \\(h_j\\), the adjusted weight for category j * \\(df\\), the degrees of freedom. Couldn’t really find the formula in the original article which is from Cochran (1937), References ---------- Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. *The R Journal, 12*(2), 134. doi:10.32614/RJ-2021-008 Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. *Supplement to the Journal of the Royal Statistical Society, 4*(1), 102–118. doi:10.2307/2984123 Hartung, J., Argaç, D., & Makambi, K. H. (2002). Small sample properties of tests on homogeneity in one-way anova and meta-analysis. *Statistical Papers, 43*(2), 197–235. doi:10.1007/s00362-002-0097-8 Mezui-Mbeng, P. (2015). A note on Cochran test for homogeneity in two ways ANOVA and meta-analysis. *Open Journal of Statistics, 5*(7), 787–796. doi:10.4236/ojs.2015.57078 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' if type(nomField) == list: nomField = pd.Series(nomField) if type(scaleField) == list: scaleField = pd.Series(scaleField) data = pd.concat([nomField, scaleField], axis=1) data.columns = ["category", "score"] #remove unused categories if categories is not None: data = data[data.category.isin(categories)] #Remove rows with missing values and reset index data = data.dropna() data.reset_index() #overall n, mean and ss n = len(data["category"]) m = data.score.mean() sst = data.score.var()*(n-1) #sample sizes, variances and means per category nj = data.groupby('category').count() sj2 = data.groupby('category').var() mj = data.groupby('category').mean() #number of categories k = len(mj) wj = nj / sj2 w = float(wj.sum()) hj = wj/w yw = float((hj*mj).sum()) chi2Val = float((wj*(mj - yw)**2).sum()) df = k - 1 pVal = chi2.sf(chi2Val, df) #results res = pd.DataFrame([[n, chi2Val, df, pVal]]) res.columns = ["n", "statistic", "df", "p-value"] return res