Module `stikpetP.effect_sizes.eff_size_epsilon_sq`

Expand source code

import pandas as pd
from ..other.table_cross import tab_cross
from ..effect_sizes.eff_size_eta_sq import es_eta_sq

def es_epsilon_sq(catField, ordField, categories=None, levels=None, useRanks=False):
    '''
    Epsilon Squared
    ---------------
    An effect size measure to indicate the the strength of the categories on the ordinal/scale field. A 0 would indicate no influence, and 1 a perfect relationship.
    
    This is an attempt to make eta-squared unbiased (applying a population correction ratio) (Kelley, 1935, p. 557). Although a popular belief is that omega-squared is preferred over epsilon-squared (Keselman, 1975), a later study actually showed that epsilon-squared might be preferred (Okada, 2013).
    
    Tomczak and Tomczak (2014) recommend this as one option to be used with a Kruskal-Wallis test, however I think they labelled epsilon-squared as eta-squared and the other way around.
    
    Parameters
    ----------
    catField : pandas series
        data with categories
    ordField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
    levels : list or dictionary, optional
        the levels or order used in ordField.
    useRanks : Boolean, optional
        use ranks or use the scores as given in ordfield. Default is False.
        
    Returns
    -------
    epsSq : float
        the epsilon squared value
        
    Notes
    -----
    The formula used (Kelley, 1935, p. 557):
    $$\\epsilon^2 = \\frac{n\\times\\eta^2 - k+\\left(1-\\eta^2\\right)}{n-k}$$
    
    There are quite some variations on this formula, all giving the same results:
    
    Cureton (1966, p. 605):
    $$\\epsilon^2 = 1 - \\frac{n-1}{n-k}\\times\\left(1 - \\eta^2\\right)$$
    
    Albers and Lakens (2018, p. 188)
    $$\\epsilon^2 = \\frac{SS_b - df_b\\times MS_w}{SS_t}$$
    
    Carroll and Nordholm (1975, p. 547):
    $$\\epsilon^2 = \\frac{F-1}{F+\\frac{n+k}{k-1}}$$
    
    Albers and Lakens (2018, p. 194)
    $$\\epsilon^2 = \\frac{F-1}{F+\\frac{df_w}{df_b}}$$
    
    If ranks are used, the epsilon-squared can also be determined using (Tomczak & Tomczak, 2014, p. 24):
    $$\\epsilon^2 = \\frac{H - k + 1}{n - k}$$
    
    *Symbols used:* 
    
    * \\(n\\), the total sample size
    * \\(k\\), the number of categories
    * \\(\\eta^2\\), eta-squared value (see es_eta_sq()).
    * \\(SS_b\\), the between sum of squares (sum of squared deviation of the mean)
    * \\(SS_t\\), the total sum of squares (sum of squared deviation of the mean)
    * \\(F\\), the F-statistic
    * \\(H\\), H-statistic from Kruskal-Wallis H-test
    * \\(df_i\\), the degrees of freedom of i (see ts_fisher_owa())
    * \\(MS_i\\), the i mean square (see ts_fisher_owa())
    * \\(b\\), is between = factor = treatment = model
    * \\(w\\), is within = error (the variability within the groups)
    
    References
    ----------
    Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. *Journal of Experimental Social Psychology, 74*, 187–195. doi:10.1016/j.jesp.2017.09.004
    
    Carroll, R. M., & Nordholm, L. A. (1975). Sampling characteristics of Kelley’s \eqn{\epsilon} and Hays’ \eqn{\omega}. *Educational and Psychological Measurement, 35*(3), 541–554. doi:10.1177/001316447503500304
    
    Cureton, E. E. (1966). On correlation coefficients. *Psychometrika, 31*(4), 605–607. doi:10.1007/BF02289528
    
    Kelley, T. L. (1935). An unbiased correlation ratio measure. *Proceedings of the National Academy of Sciences, 21*(9), 554–559. doi:10.1073/pnas.21.9.554
    
    Keselman, H. J. (1975). A Monte Carlo investigation of three estimates of treatment magnitude: Epsilon squared, eta squared, and omega squared. *Canadian Psychological Review / Psychologie Canadienne, 16*(1), 44–48. doi:10.1037/h0081789
    
    Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices in one-way anova. *Behaviormetrika, 40*(2), 129–147. doi:10.2333/bhmk.40.129
    
    Pearson, K. (1911). On a correction to be made to the correlation ratio η. *Biometrika, 8*(1/2), 254. doi:10.2307/2331454
    
    Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. *Trends in Sport Sciences, 1*(21), 19–25.

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    #create the cross table    
    ct = tab_cross(ordField, catField, order1=levels, order2=categories, totals="include")
    
    #basic counts
    k = ct.shape[1]-1
    nlvl = ct.shape[0]-1
    n = ct.iloc[nlvl, k]
    
    e2 = es_eta_sq(catField, ordField, categories, levels, useRanks)
    
    epsSq = (n * e2 - k + (1 - e2)) / (n - k)

    return epsSq

Functions

def es_epsilon_sq(catField, ordField, categories=None, levels=None, useRanks=False)

Epsilon Squared

An effect size measure to indicate the the strength of the categories on the ordinal/scale field. A 0 would indicate no influence, and 1 a perfect relationship.

This is an attempt to make eta-squared unbiased (applying a population correction ratio) (Kelley, 1935, p. 557). Although a popular belief is that omega-squared is preferred over epsilon-squared (Keselman, 1975), a later study actually showed that epsilon-squared might be preferred (Okada, 2013).

Tomczak and Tomczak (2014) recommend this as one option to be used with a Kruskal-Wallis test, however I think they labelled epsilon-squared as eta-squared and the other way around.

Parameters

catField : pandas series: data with categories
ordField : pandas series: data with the scores
categories : list or dictionary, optional: the categories to use from catField
levels : list or dictionary, optional: the levels or order used in ordField.
useRanks : Boolean, optional: use ranks or use the scores as given in ordfield. Default is False.

Returns

epsSq : float: the epsilon squared value

Notes

The formula used (Kelley, 1935, p. 557): $\epsilon^2 = \frac{n\times\eta^2 - k+\left(1-\eta^2\right)}{n-k}$

There are quite some variations on this formula, all giving the same results:

Cureton (1966, p. 605): $\epsilon^2 = 1 - \frac{n-1}{n-k}\times\left(1 - \eta^2\right)$

Albers and Lakens (2018, p. 188) $\epsilon^2 = \frac{SS_b - df_b\times MS_w}{SS_t}$

Carroll and Nordholm (1975, p. 547): $\epsilon^2 = \frac{F-1}{F+\frac{n+k}{k-1}}$

Albers and Lakens (2018, p. 194) $\epsilon^2 = \frac{F-1}{F+\frac{df_w}{df_b}}$

If ranks are used, the epsilon-squared can also be determined using (Tomczak & Tomczak, 2014, p. 24): $\epsilon^2 = \frac{H - k + 1}{n - k}$

Symbols used:

$n$ , the total sample size
$k$ , the number of categories
$\eta^2$ , eta-squared value (see es_eta_sq()).
$SS_b$ , the between sum of squares (sum of squared deviation of the mean)
$SS_t$ , the total sum of squares (sum of squared deviation of the mean)
$F$ , the F-statistic
$H$ , H-statistic from Kruskal-Wallis H-test
$df_i$ , the degrees of freedom of i (see ts_fisher_owa())
$MS_i$ , the i mean square (see ts_fisher_owa())
$b$ , is between = factor = treatment = model
$w$ , is within = error (the variability within the groups)

References

Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of Experimental Social Psychology, 74, 187–195. doi:10.1016/j.jesp.2017.09.004

Carroll, R. M., & Nordholm, L. A. (1975). Sampling characteristics of Kelley’s \eqn{\epsilon} and Hays’ \eqn{\omega}. Educational and Psychological Measurement, 35(3), 541–554. doi:10.1177/001316447503500304

Cureton, E. E. (1966). On correlation coefficients. Psychometrika, 31(4), 605–607. doi:10.1007/BF02289528

Kelley, T. L. (1935). An unbiased correlation ratio measure. Proceedings of the National Academy of Sciences, 21(9), 554–559. doi:10.1073/pnas.21.9.554

Keselman, H. J. (1975). A Monte Carlo investigation of three estimates of treatment magnitude: Epsilon squared, eta squared, and omega squared. Canadian Psychological Review / Psychologie Canadienne, 16(1), 44–48. doi:10.1037/h0081789

Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices in one-way anova. Behaviormetrika, 40(2), 129–147. doi:10.2333/bhmk.40.129

Pearson, K. (1911). On a correction to be made to the correlation ratio η. Biometrika, 8(1/2), 254. doi:10.2307/2331454

Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in Sport Sciences, 1(21), 19–25.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code

def es_epsilon_sq(catField, ordField, categories=None, levels=None, useRanks=False):
    '''
    Epsilon Squared
    ---------------
    An effect size measure to indicate the the strength of the categories on the ordinal/scale field. A 0 would indicate no influence, and 1 a perfect relationship.
    
    This is an attempt to make eta-squared unbiased (applying a population correction ratio) (Kelley, 1935, p. 557). Although a popular belief is that omega-squared is preferred over epsilon-squared (Keselman, 1975), a later study actually showed that epsilon-squared might be preferred (Okada, 2013).
    
    Tomczak and Tomczak (2014) recommend this as one option to be used with a Kruskal-Wallis test, however I think they labelled epsilon-squared as eta-squared and the other way around.
    
    Parameters
    ----------
    catField : pandas series
        data with categories
    ordField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
    levels : list or dictionary, optional
        the levels or order used in ordField.
    useRanks : Boolean, optional
        use ranks or use the scores as given in ordfield. Default is False.
        
    Returns
    -------
    epsSq : float
        the epsilon squared value
        
    Notes
    -----
    The formula used (Kelley, 1935, p. 557):
    $$\\epsilon^2 = \\frac{n\\times\\eta^2 - k+\\left(1-\\eta^2\\right)}{n-k}$$
    
    There are quite some variations on this formula, all giving the same results:
    
    Cureton (1966, p. 605):
    $$\\epsilon^2 = 1 - \\frac{n-1}{n-k}\\times\\left(1 - \\eta^2\\right)$$
    
    Albers and Lakens (2018, p. 188)
    $$\\epsilon^2 = \\frac{SS_b - df_b\\times MS_w}{SS_t}$$
    
    Carroll and Nordholm (1975, p. 547):
    $$\\epsilon^2 = \\frac{F-1}{F+\\frac{n+k}{k-1}}$$
    
    Albers and Lakens (2018, p. 194)
    $$\\epsilon^2 = \\frac{F-1}{F+\\frac{df_w}{df_b}}$$
    
    If ranks are used, the epsilon-squared can also be determined using (Tomczak & Tomczak, 2014, p. 24):
    $$\\epsilon^2 = \\frac{H - k + 1}{n - k}$$
    
    *Symbols used:* 
    
    * \\(n\\), the total sample size
    * \\(k\\), the number of categories
    * \\(\\eta^2\\), eta-squared value (see es_eta_sq()).
    * \\(SS_b\\), the between sum of squares (sum of squared deviation of the mean)
    * \\(SS_t\\), the total sum of squares (sum of squared deviation of the mean)
    * \\(F\\), the F-statistic
    * \\(H\\), H-statistic from Kruskal-Wallis H-test
    * \\(df_i\\), the degrees of freedom of i (see ts_fisher_owa())
    * \\(MS_i\\), the i mean square (see ts_fisher_owa())
    * \\(b\\), is between = factor = treatment = model
    * \\(w\\), is within = error (the variability within the groups)
    
    References
    ----------
    Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. *Journal of Experimental Social Psychology, 74*, 187–195. doi:10.1016/j.jesp.2017.09.004
    
    Carroll, R. M., & Nordholm, L. A. (1975). Sampling characteristics of Kelley’s \eqn{\epsilon} and Hays’ \eqn{\omega}. *Educational and Psychological Measurement, 35*(3), 541–554. doi:10.1177/001316447503500304
    
    Cureton, E. E. (1966). On correlation coefficients. *Psychometrika, 31*(4), 605–607. doi:10.1007/BF02289528
    
    Kelley, T. L. (1935). An unbiased correlation ratio measure. *Proceedings of the National Academy of Sciences, 21*(9), 554–559. doi:10.1073/pnas.21.9.554
    
    Keselman, H. J. (1975). A Monte Carlo investigation of three estimates of treatment magnitude: Epsilon squared, eta squared, and omega squared. *Canadian Psychological Review / Psychologie Canadienne, 16*(1), 44–48. doi:10.1037/h0081789
    
    Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices in one-way anova. *Behaviormetrika, 40*(2), 129–147. doi:10.2333/bhmk.40.129
    
    Pearson, K. (1911). On a correction to be made to the correlation ratio η. *Biometrika, 8*(1/2), 254. doi:10.2307/2331454
    
    Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. *Trends in Sport Sciences, 1*(21), 19–25.

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    #create the cross table    
    ct = tab_cross(ordField, catField, order1=levels, order2=categories, totals="include")
    
    #basic counts
    k = ct.shape[1]-1
    nlvl = ct.shape[0]-1
    n = ct.iloc[nlvl, k]
    
    e2 = es_eta_sq(catField, ordField, categories, levels, useRanks)
    
    epsSq = (n * e2 - k + (1 - e2)) / (n - k)

    return epsSq