Module `stikpetP.effect_sizes.eff_size_rmsse`

Expand source code

import pandas as pd

def es_rmsse(nomField, scaleField, categories=None):
    '''
    Root Mean Square Standardized Effect Size (RMSSE)
    -------------------------------------------------
    An effect size measure for a one-way ANOVA.
    
    Similar as Hedges g, but for a one-way ANOVA. According to Wikipedia "this essentially presents the omnibus difference of the entire model adjusted by the root mean square" (2023).    
    
    Parameters
    ----------
    nomField : pandas series
        data with categories
    scaleField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
        
    Returns
    -------
    rmsse : float
        the rmsse value
    
    Notes
    -----
    The formula used (Steiger & Fouladi, 1997, p. 245):
    $$RMSSE = \\sqrt{\\frac{\\delta}{\\left(k-1\\right)\\times n}}$$
    
    With:
    $$\\delta = n\\times\\sum_{i=1}^k \\left(\\frac{\\alpha_i}{\\sigma}\\right)^2$$
    $$\\alpha_i = \\mu_i - \\mu \\approx \\bar{x}_i - \\bar{x}$$
    $$\\sigma \\approx \\sqrt{MS_w}$$
    $$MS_w = \\frac{SS_w}{df_w}$$
    $$df_w = n - k$$
    $$SS_w = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}_j\\right)^2$$
    $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
    $$\\bar{x} = \\frac{\\sum_{j=1}^k n_j \\times \\bar{x}_j}{n} = \\frac{\\sum_{j=1}^k \\sum_{i=1}^{n_j} x_{i,j}}{n}$$
    $$n = \\sum_{j=1}^k n_j$$
    
    Zhang and Algina (2011) create a robust version of this for one-way fixed effects anova.
    
    References
    ----------
    Steiger, J. H., & Fouladi, R. T. (1997). *Noncentrality interval estimation and the evaluation of statistical models*. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger, What if there were no significance tests? (pp. 221–257). Lawrence Erlbaum Associates.
    
    Wikipedia. (2023). Effect size. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Effect_size&oldid=1175948622

    Zhang, G., & Algina, J. (2011). A robust root mean square standardized effect size in one-way fixed-effects ANOVA. *Journal of Modern Applied Statistical Methods, 10*(1), 77–96. doi:10.22237/jmasm/1304222880
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    
    if type(nomField) == list:
        nomField = pd.Series(nomField)
        
    if type(scaleField) == list:
        scaleField = pd.Series(scaleField)
        
    data = pd.concat([nomField, scaleField], axis=1)
    data.columns = ["category", "score"]
    
    #remove unused categories
    if categories is not None:
        data = data[data.category.isin(categories)]
    
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data.reset_index()
    
    #overall n, mean and ss
    n = len(data["category"])
    m = data.score.mean()
    sst = data.score.var()*(n-1)
    
    #sample sizes, and means per category
    nj = data.groupby('category').count()
    sj = data.groupby('category').sum()
    mj = data.groupby('category').mean()
    
    #number of categories
    k = len(mj)
    
    ssb = float((nj*(mj-m)**2).sum())
    ssw = sst - ssb
    
    dfb = k - 1
    dfw = n - k
    dft = n - 1
    
    msb = ssb/dfb
    msw = ssw/dfw
    
    aj = mj - m
    s = msw**0.5
    d = n*((aj/s)**2).sum()
    rmsse = float((d/((k-1)*n))**0.5)
    
    return rmsse

Functions

def es_rmsse(nomField, scaleField, categories=None)

Root Mean Square Standardized Effect Size (RMSSE)

An effect size measure for a one-way ANOVA.

Similar as Hedges g, but for a one-way ANOVA. According to Wikipedia "this essentially presents the omnibus difference of the entire model adjusted by the root mean square" (2023).

Parameters

nomField : pandas series: data with categories
scaleField : pandas series: data with the scores
categories : list or dictionary, optional: the categories to use from catField

Returns

rmsse : float: the rmsse value

Notes

The formula used (Steiger & Fouladi, 1997, p. 245): $RMSSE = \sqrt{\frac{\delta}{\left(k-1\right)\times n}}$

With: $\delta = n\times\sum_{i=1}^k \left(\frac{\alpha_i}{\sigma}\right)^2$ $\alpha_i = \mu_i - \mu \approx \bar{x}_i - \bar{x}$ $\sigma \approx \sqrt{MS_w}$ $MS_w = \frac{SS_w}{df_w}$ $df_w = n - k$ $SS_w = \sum_{j=1}^k \sum_{i=1}^{n_j} \left(x_{i,j} - \bar{x}_j\right)^2$ $\bar{x}_j = \frac{\sum_{i=1}^{n_j} x_{i,j}}{n_j}$ $\bar{x} = \frac{\sum_{j=1}^k n_j \times \bar{x}_j}{n} = \frac{\sum_{j=1}^k \sum_{i=1}^{n_j} x_{i,j}}{n}$ $n = \sum_{j=1}^k n_j$

Zhang and Algina (2011) create a robust version of this for one-way fixed effects anova.

References

Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of statistical models. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger, What if there were no significance tests? (pp. 221–257). Lawrence Erlbaum Associates.

Wikipedia. (2023). Effect size. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Effect_size&oldid=1175948622

Zhang, G., & Algina, J. (2011). A robust root mean square standardized effect size in one-way fixed-effects ANOVA. Journal of Modern Applied Statistical Methods, 10(1), 77–96. doi:10.22237/jmasm/1304222880

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code

def es_rmsse(nomField, scaleField, categories=None):
    '''
    Root Mean Square Standardized Effect Size (RMSSE)
    -------------------------------------------------
    An effect size measure for a one-way ANOVA.
    
    Similar as Hedges g, but for a one-way ANOVA. According to Wikipedia "this essentially presents the omnibus difference of the entire model adjusted by the root mean square" (2023).    
    
    Parameters
    ----------
    nomField : pandas series
        data with categories
    scaleField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
        
    Returns
    -------
    rmsse : float
        the rmsse value
    
    Notes
    -----
    The formula used (Steiger & Fouladi, 1997, p. 245):
    $$RMSSE = \\sqrt{\\frac{\\delta}{\\left(k-1\\right)\\times n}}$$
    
    With:
    $$\\delta = n\\times\\sum_{i=1}^k \\left(\\frac{\\alpha_i}{\\sigma}\\right)^2$$
    $$\\alpha_i = \\mu_i - \\mu \\approx \\bar{x}_i - \\bar{x}$$
    $$\\sigma \\approx \\sqrt{MS_w}$$
    $$MS_w = \\frac{SS_w}{df_w}$$
    $$df_w = n - k$$
    $$SS_w = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}_j\\right)^2$$
    $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
    $$\\bar{x} = \\frac{\\sum_{j=1}^k n_j \\times \\bar{x}_j}{n} = \\frac{\\sum_{j=1}^k \\sum_{i=1}^{n_j} x_{i,j}}{n}$$
    $$n = \\sum_{j=1}^k n_j$$
    
    Zhang and Algina (2011) create a robust version of this for one-way fixed effects anova.
    
    References
    ----------
    Steiger, J. H., & Fouladi, R. T. (1997). *Noncentrality interval estimation and the evaluation of statistical models*. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger, What if there were no significance tests? (pp. 221–257). Lawrence Erlbaum Associates.
    
    Wikipedia. (2023). Effect size. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Effect_size&oldid=1175948622

    Zhang, G., & Algina, J. (2011). A robust root mean square standardized effect size in one-way fixed-effects ANOVA. *Journal of Modern Applied Statistical Methods, 10*(1), 77–96. doi:10.22237/jmasm/1304222880
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    
    if type(nomField) == list:
        nomField = pd.Series(nomField)
        
    if type(scaleField) == list:
        scaleField = pd.Series(scaleField)
        
    data = pd.concat([nomField, scaleField], axis=1)
    data.columns = ["category", "score"]
    
    #remove unused categories
    if categories is not None:
        data = data[data.category.isin(categories)]
    
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data.reset_index()
    
    #overall n, mean and ss
    n = len(data["category"])
    m = data.score.mean()
    sst = data.score.var()*(n-1)
    
    #sample sizes, and means per category
    nj = data.groupby('category').count()
    sj = data.groupby('category').sum()
    mj = data.groupby('category').mean()
    
    #number of categories
    k = len(mj)
    
    ssb = float((nj*(mj-m)**2).sum())
    ssw = sst - ssb
    
    dfb = k - 1
    dfw = n - k
    dft = n - 1
    
    msb = ssb/dfb
    msw = ssw/dfw
    
    aj = mj - m
    s = msw**0.5
    d = n*((aj/s)**2).sum()
    rmsse = float((d/((k-1)*n))**0.5)
    
    return rmsse