Module stikpetP.tests.test_cochran_owa

Expand source code
import pandas as pd
from scipy.stats import chi2

def ts_cochran_owa(nomField, scaleField, categories=None):
    '''
    Cochran One-Way ANOVA
    ----------------------
    Tests if the means (averages) of each category could be the same in the population.
    
    Note that according to Hartung et al. (2002, p. 225) the Cochran test is the standard test in meta-analysis, but should not be used, since it is always too liberal.
    
    If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.
    
    There are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences.
    
    Parameters
    ----------
    nomField : pandas series
        data with categories
    scaleField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
    
    Returns
    -------
    Dataframe with:
    
    * *n*, the sample size
    * *statistic*, the test statistic (chi-square value)
    * *df*, degrees of freedom
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    The formula used is (Cavus & Yazıcı, 2020, p. 5; Hartung et al., 2002, p. 202; Mezui-Mbeng, 2015, p. 787):
    $$\\chi_{Cochran}^2 = \\sum_{j=1}^k w_j\\times\\left(\\bar{x}_j - \\bar{y}_w\\right)^2$$
    $$df = k - 1$$
    $$sig. = 1 - \\chi^2\\left(\\chi_{Cochran}^2, df\\right)$$
    
    With:
    $$\\bar{y}_w = \\sum_{j=1}^k h_j\\times\\bar{x}_j$$
    $$h_j = \\frac{w_j}{w}$$
    $$w = \\sum_{j=1}^k w_j$$
    $$w_j = \\frac{n_j}{s_j^2}$$
    $$s_j^2 = \\frac{\\sum_{i=1}^{n_j}\\left(x_{i,j} - \\bar{x}_j\\right)^2}{n_j - 1}$$
    $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\), the i-th score in category j
    * \\(k\\), the number of categories
    * \\(n_j\\), the sample size of category j
    * \\(\\bar{x}_j\\), the sample mean of category j
    * \\(s_j^2\\), the sample variance of the scores in category j
    * \\(w_j\\), the weight for category j
    * \\(h_j\\), the adjusted weight for category j
    * \\(df\\), the degrees of freedom.    
    
    Couldn’t really find the formula in the original article which is from Cochran (1937), 
    
    References
    ----------
    Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. *The R Journal, 12*(2), 134. doi:10.32614/RJ-2021-008
    
    Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. *Supplement to the Journal of the Royal Statistical Society, 4*(1), 102–118. doi:10.2307/2984123
    
    Hartung, J., Argaç, D., & Makambi, K. H. (2002). Small sample properties of tests on homogeneity in one-way anova and meta-analysis. *Statistical Papers, 43*(2), 197–235. doi:10.1007/s00362-002-0097-8
    
    Mezui-Mbeng, P. (2015). A note on Cochran test for homogeneity in two ways ANOVA and meta-analysis. *Open Journal of Statistics, 5*(7), 787–796. doi:10.4236/ojs.2015.57078

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    if type(nomField) == list:
        nomField = pd.Series(nomField)
        
    if type(scaleField) == list:
        scaleField = pd.Series(scaleField)
        
    data = pd.concat([nomField, scaleField], axis=1)
    data.columns = ["category", "score"]
    
    #remove unused categories
    if categories is not None:
        data = data[data.category.isin(categories)]
    
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data.reset_index()
    
    #overall n, mean and ss
    n = len(data["category"])
    m = data.score.mean()
    sst = data.score.var()*(n-1)
    
    #sample sizes, variances and means per category
    nj = data.groupby('category').count()
    sj2 = data.groupby('category').var()
    mj = data.groupby('category').mean()
    
    #number of categories
    k = len(mj)
    
    wj = nj / sj2
    w = float(wj.sum())
    hj = wj/w
    yw = float((hj*mj).sum())
    
    chi2Val = float((wj*(mj - yw)**2).sum())
    df = k - 1
    pVal = chi2.sf(chi2Val, df)
    
    #results
    res = pd.DataFrame([[n, chi2Val, df, pVal]])
    res.columns = ["n", "statistic", "df", "p-value"]
    
    return res

Functions

def ts_cochran_owa(nomField, scaleField, categories=None)

Cochran One-Way ANOVA

Tests if the means (averages) of each category could be the same in the population.

Note that according to Hartung et al. (2002, p. 225) the Cochran test is the standard test in meta-analysis, but should not be used, since it is always too liberal.

If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.

There are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences.

Parameters

nomField : pandas series
data with categories
scaleField : pandas series
data with the scores
categories : list or dictionary, optional
the categories to use from catField

Returns

Dataframe with:
 
  • n, the sample size
  • statistic, the test statistic (chi-square value)
  • df, degrees of freedom
  • p-value, the p-value (significance)

Notes

The formula used is (Cavus & Yazıcı, 2020, p. 5; Hartung et al., 2002, p. 202; Mezui-Mbeng, 2015, p. 787): \chi_{Cochran}^2 = \sum_{j=1}^k w_j\times\left(\bar{x}_j - \bar{y}_w\right)^2 df = k - 1 sig. = 1 - \chi^2\left(\chi_{Cochran}^2, df\right)

With: \bar{y}_w = \sum_{j=1}^k h_j\times\bar{x}_j h_j = \frac{w_j}{w} w = \sum_{j=1}^k w_j w_j = \frac{n_j}{s_j^2} s_j^2 = \frac{\sum_{i=1}^{n_j}\left(x_{i,j} - \bar{x}_j\right)^2}{n_j - 1} \bar{x}_j = \frac{\sum_{i=1}^{n_j} x_{i,j}}{n_j}

Symbols used:

  • x_{i,j}, the i-th score in category j
  • k, the number of categories
  • n_j, the sample size of category j
  • \bar{x}_j, the sample mean of category j
  • s_j^2, the sample variance of the scores in category j
  • w_j, the weight for category j
  • h_j, the adjusted weight for category j
  • df, the degrees of freedom.

Couldn’t really find the formula in the original article which is from Cochran (1937),

References

Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. The R Journal, 12(2), 134. doi:10.32614/RJ-2021-008

Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. Supplement to the Journal of the Royal Statistical Society, 4(1), 102–118. doi:10.2307/2984123

Hartung, J., Argaç, D., & Makambi, K. H. (2002). Small sample properties of tests on homogeneity in one-way anova and meta-analysis. Statistical Papers, 43(2), 197–235. doi:10.1007/s00362-002-0097-8

Mezui-Mbeng, P. (2015). A note on Cochran test for homogeneity in two ways ANOVA and meta-analysis. Open Journal of Statistics, 5(7), 787–796. doi:10.4236/ojs.2015.57078

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def ts_cochran_owa(nomField, scaleField, categories=None):
    '''
    Cochran One-Way ANOVA
    ----------------------
    Tests if the means (averages) of each category could be the same in the population.
    
    Note that according to Hartung et al. (2002, p. 225) the Cochran test is the standard test in meta-analysis, but should not be used, since it is always too liberal.
    
    If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.
    
    There are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences.
    
    Parameters
    ----------
    nomField : pandas series
        data with categories
    scaleField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
    
    Returns
    -------
    Dataframe with:
    
    * *n*, the sample size
    * *statistic*, the test statistic (chi-square value)
    * *df*, degrees of freedom
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    The formula used is (Cavus & Yazıcı, 2020, p. 5; Hartung et al., 2002, p. 202; Mezui-Mbeng, 2015, p. 787):
    $$\\chi_{Cochran}^2 = \\sum_{j=1}^k w_j\\times\\left(\\bar{x}_j - \\bar{y}_w\\right)^2$$
    $$df = k - 1$$
    $$sig. = 1 - \\chi^2\\left(\\chi_{Cochran}^2, df\\right)$$
    
    With:
    $$\\bar{y}_w = \\sum_{j=1}^k h_j\\times\\bar{x}_j$$
    $$h_j = \\frac{w_j}{w}$$
    $$w = \\sum_{j=1}^k w_j$$
    $$w_j = \\frac{n_j}{s_j^2}$$
    $$s_j^2 = \\frac{\\sum_{i=1}^{n_j}\\left(x_{i,j} - \\bar{x}_j\\right)^2}{n_j - 1}$$
    $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\), the i-th score in category j
    * \\(k\\), the number of categories
    * \\(n_j\\), the sample size of category j
    * \\(\\bar{x}_j\\), the sample mean of category j
    * \\(s_j^2\\), the sample variance of the scores in category j
    * \\(w_j\\), the weight for category j
    * \\(h_j\\), the adjusted weight for category j
    * \\(df\\), the degrees of freedom.    
    
    Couldn’t really find the formula in the original article which is from Cochran (1937), 
    
    References
    ----------
    Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. *The R Journal, 12*(2), 134. doi:10.32614/RJ-2021-008
    
    Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. *Supplement to the Journal of the Royal Statistical Society, 4*(1), 102–118. doi:10.2307/2984123
    
    Hartung, J., Argaç, D., & Makambi, K. H. (2002). Small sample properties of tests on homogeneity in one-way anova and meta-analysis. *Statistical Papers, 43*(2), 197–235. doi:10.1007/s00362-002-0097-8
    
    Mezui-Mbeng, P. (2015). A note on Cochran test for homogeneity in two ways ANOVA and meta-analysis. *Open Journal of Statistics, 5*(7), 787–796. doi:10.4236/ojs.2015.57078

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    if type(nomField) == list:
        nomField = pd.Series(nomField)
        
    if type(scaleField) == list:
        scaleField = pd.Series(scaleField)
        
    data = pd.concat([nomField, scaleField], axis=1)
    data.columns = ["category", "score"]
    
    #remove unused categories
    if categories is not None:
        data = data[data.category.isin(categories)]
    
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data.reset_index()
    
    #overall n, mean and ss
    n = len(data["category"])
    m = data.score.mean()
    sst = data.score.var()*(n-1)
    
    #sample sizes, variances and means per category
    nj = data.groupby('category').count()
    sj2 = data.groupby('category').var()
    mj = data.groupby('category').mean()
    
    #number of categories
    k = len(mj)
    
    wj = nj / sj2
    w = float(wj.sum())
    hj = wj/w
    yw = float((hj*mj).sum())
    
    chi2Val = float((wj*(mj - yw)**2).sum())
    df = k - 1
    pVal = chi2.sf(chi2Val, df)
    
    #results
    res = pd.DataFrame([[n, chi2Val, df, pVal]])
    res.columns = ["n", "statistic", "df", "p-value"]
    
    return res