Module `stikpetP.tests.test_trimmed_mean_is`

Expand source code

from math import floor
from statistics import mean, variance
from scipy.stats import t 
import pandas as pd

def ts_trimmed_mean_is(catField, scaleField, categories=None, dmu=0, trimProp = 0.1, se = "yuen"):
    '''
    Independent Samples Trimmed/Yuen Mean Test
    ------------------------------------------
    A test to compare two means. The null hypothesis would be that the means of each category are equal in the population.
    
    As the name implies a trimmed mean test, trims the data and then performs the test. This does mean it actually tests if trimmed means are equal, rather than the regular means. As a benefit, unlike the Student and Welch test, it doesn't require the assumption of normality. The regular trimmed mean test (Yuen-Dixon) is the variation of the Student test and does require the assumption of equal variances, while the Yuen-Welch variation doesn't.

    The test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/TrimmedMeansIS.html)
    
    Parameters
    ----------
    catField : dataframe or list 
        the categorical data
    scaleField : dataframe or list
        the scores
    categories : list, optional 
        to indicate which two categories of catField to use, otherwise first two found will be used.
    dmu : float, optional 
        difference according to null hypothesis (default is 0)
    trimProp : float optional
        proportion to trim in total for each category. If for example set to 0.1 then 0.05 from each side for each category will be trimmed. Default is 0.1.
    se : {"yuen", "yuen-dixon"}, optional
        to indicate which standard error to use. Default is "Yuen"
        
    Returns
    -------
    A dataframe with:
    
    * *n cat. 1*, the sample size of the first category
    * *n cat. 2*, the sample size of the second category
    * *trim mean cat. 1*, the sample mean of the first category
    * *trim mean cat. 2*, the sample mean of the second category
    * *diff.*, difference between the two sample means
    * *hyp. diff.*, hypothesized difference between the two population means
    * *statistic*, the test statistic (t-value)
    * *df*, the degrees of freedom
    * *pValue*, the significance (p-value)
    * *test*, name of test used
    
    Notes
    -----
    **YUEN**
    
    The default `se="yuen"` will perform a Yuen-Welch test.
    
    The formula used is (Yuen, 1974, p. 167):
    $$t = \\frac{\\bar{x}_{t,1} - \\bar{x}_{t,2}}{SE}$$
    $$sig = 2\\times\\left(1 - T\\left(\\left|t\\right|, df\\right)\\right)$$
    
    With:
    $$SE = \\sqrt{\\frac{s_{w,1}^2}{m_1} + \\frac{s_{w,2}^2}{m_2}}$$
    $$s_{w,i}^2 = \\frac{SSD_{w,i}}{m_i - 1}$$
    $$df = \\frac{1}{\\frac{c^2}{m_1 - 1} + \\frac{\\left(1 - c\\right)^2}{m_2 -1}}$$
    $$c = \\frac{\\frac{s_{w,1}^2}{m_1}}{\\frac{s_{w,1}^2}{m_1} + \\frac{s_{w,2}^2}{m_2}}$$
    $$\\bar{x}_{t,i} = \\frac{\\sum_{j=g_i+1}^{n_i - g_i}y_{i,j}}{}$$
    $$g_i = \\lfloor n_i\\times p_t\\rfloor$$
    $$m_i = n_ - 2\\times g_i$$
    $$SSD_{w,i} = g_i\\times\\left(y_{i,g_i+1} - \\bar{x}_{wi}\\right)^2 + g_i\\times\\left(y_{i,n_i-g_i} - \\bar{x}_{w,i}\\right)^2 + \\sum_{j=g+1}^{n_i - g_i} \\left(y_{i,j} - \\bar{x}_{w,i}\\right)^2$$
    $$\\bar{x}_{w,i} = \\frac{\\bar{x}_{t,i}\\times m_i + g_i\\times\\left(y_{i, g_i+1} + y_{i, n_i-g_i}\\right)}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{t,i}\\) the trimmed mean of the scores in category i
    * \\(x_{w,i}\\) The Winsorized mean of the scores in category i
    * \\(SSD_{w,i}\\) the sum of squared deviations from the Winsorized mean of category i
    * \\(m_i\\) the number of scores in the trimmed data set from category i
    * \\(y_{i,j}\\) the j-th score after the scores in category i, after they are sorted from low to high
    * \\(p_t\\) the proportion of trimming on each side, we can define
    
    **YUEN-DIXON**
    
    If `se="yuen-dixon` a trimmed means test will be performed.
    
    The formula used is (Yuen & Dixon, 1973, p. 394):
    $$t = \\frac{\\bar{x}_{t,1} - \\bar{x}_{t,2}}{SE}$$
    $$sig = 2\\times\\left(1 - T\\left(\\left|t\\right|, df\\right)\\right)$$
    
    With:
    $$SE = \\sqrt{\\frac{SSD_{w,1} + SSD_{w,2}}{m_1 + m_2 - 2}\\times\\left(\\frac{1}{m_1} + \\frac{1}{m_2}\\right)}$$
    $$df = m_1 + m_2 - 2$$
    $$\\bar{x}_{t,i} = \\frac{\\sum_{j=g_i+1}^{n_i - g_i}y_{i,j}}{}$$
    $$g_i = \\lfloor n_i\\times p_t\\rfloor$$
    $$m_i = n_ - 2\\times g_i$$
    $$SSD_{w,i} = g_i\\times\\left(y_{i,g_i+1} - \\bar{x}_{w,i}\\right)^2 + g_i\\times\\left(y_{i,n_i-g_i} - \\bar{x}_{w,i}\\right)^2 + \\sum_{j=g+1}^{n_i - g_i} \\left(y_{i,j} - \\bar{x}_{w,i}\\right)^2$$
    $$\\bar{x}_{w,i} = \\frac{\\bar{x}_{t,i}\\times m_i + g_i\\times\\left(y_{i, g_i+1} + y_{i, n_i-g_i}\\right)}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{ti}\\) the trimmed mean of the scores in category i
    * \\(x_{wi}\\) The Winsorized mean of the scores in category i
    * \\(SSD_{wi}\\) the sum of squared deviations from the Winsorized mean of category i
    * \\(m_i\\) the number of scores in the trimmed data set from category i
    * \\(y_{i,j}\\) the j-th score after the scores in category i, after they are sorted from low to high
    * \\(p_t\\) the proportion of trimming on each side, we can define

    Before, After and Alternatives
    ------------------------------
    Before this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation
    
    Or a visualisation are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
    
    After the test you might want an effect size measure, options include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html), [biserial correlation](../correlations/cor_biserial.html), [point-biserial correlation](../effect_sizes/cor_point_biserial.html)
    
    There are four similar tests, with different assumptions. 
    
    |test|equal variance|normality|
    |-------|-----------|---------|
    |[Student t](../tests/test_student_t_is.html)| yes | yes|
    |[Welch t](../tests/test_welch_t_is.html) | no | yes|
    |[Trimmed means](../tests/test_trimmed_mean_is.html) | yes | no | 
    |[Yuen-Welch](../tests/test_trimmed_mean_is.html)|no | no |

    Another test that in some cases could be used is the [Z test](../tests/test_z_is.html)
    
    References
    ----------
    Yuen, K. K. (1974). The two-sample trimmed t for unequal population variances. *Biometrika, 61*(1), 165–170. https://doi.org/10.1093/biomet/61.1.165
    
    Yuen, K. K., & Dixon, W. J. (1973). The approximate behaviour and performance of the two-sample trimmed t. *Biometrika, 60*(2), 369–374. https://doi.org/10.2307/2334550
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Dataframe
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['age']
    >>> ex1 = ex1.replace("89 OR OLDER", "90")
    >>> ts_trimmed_mean_is(df1['sex'], ex1)
       n FEMALE  n MALE  trim mean FEMALE  trim mean MALE     diff.  hyp. diff.  statistic           df  p-value                                   test
    0      1083     886         48.106667       47.274436  0.832231           0   0.971312  1707.922425  0.33153  Yuen-Welch independent samples t-test
    
    Example 2: List
    >>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
    >>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
    >>> ts_trimmed_mean_is(groups, scores)
       n int.  n nat.  trim mean int.  trim mean nat.  diff.  hyp. diff.  statistic        df   p-value                                   test
    0      12       6       61.916667       41.666667  20.25           0    1.69314  9.750994  0.122075  Yuen-Welch independent samples t-test
    
    '''
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(scaleField) is list:
        scaleField = pd.Series(scaleField)
    
    #combine as one dataframe
    df = pd.concat([catField, scaleField], axis=1)
    df = df.dropna()
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    #make sure they are floats
    x1 = [float(x) for x in x1]
    x2 = [float(x) for x in x2]
    
    n1 = len(x1)
    n2 = len(x2)
    n = n1 + n2
    
    #number of scores to trim from each side
    g1 = floor(n1*trimProp/2)
    g2 = floor(n2*trimProp/2)
    n1t = n1 - 2*g1
    n2t = n2 - 2*g2
    
    #sort the results
    x1.sort()
    x2.sort()
    
    #trimmed means
    m1t = mean(x1[g1:(n1 - g1)])
    m2t = mean(x2[g2:(n2 - g2)])
    
    #Winsorize the data
    x1[0:g1] = [x1[g1]]*g1    
    x2[0:g2] = [x2[g2]]*g2
    
    x1[(n1 -g1):n1] = [x1[n1 - g1 -1]]*g1
    x2[(n2 -g2):n2] = [x2[n2 - g2 -1]]*g2
    
    #Variance of winsorized data
    var1 = variance(x1)
    var2 = variance(x2)
    
    if (se=="yuen"):
        ssw1 = var1*(n1-1)
        ssw2 = var2*(n2-1)
        var1w = ssw1 / (n1t - 1)
        var2w = ssw2 / (n2t - 1)
        se = (var1w / n1t + var2w / n2t)**0.5
        
        c = (var1w / n1t) / (var1w / n1t + var2w / n2t)
        df = 1 / (c**2 / (n1t - 1) + (1 - c)**2 / (n2t - 1))
        testUsed = "Yuen-Welch independent samples t-test"
    else:
        s2 = ((n1 - 1)*var1 + (n2 - 1)*var2)/((n1t - 1) + (n2t - 1))
        se = (s2 * (1/n1t + 1/n2t))**0.5
        df = n1t + n2t - 2
        testUsed = "Trimmed Mean independent samples t-test"
        
    tValue = (m1t - m2t - dmu) / se
    pValue = 2*(1-t.cdf(abs(tValue), df))
    statistic=tValue
    
    colnames = ["n "+cat1, "n "+cat2, "trim mean "+cat1, "trim mean "+cat2, "diff.", "hyp. diff.", "statistic", "df", "p-value", "test"]
    results = pd.DataFrame([[n1, n2, m1t, m2t, m1t - m2t, dmu, statistic, df, pValue, testUsed]], columns=colnames)
    
    return(results)

Functions

def ts_trimmed_mean_is(catField, scaleField, categories=None, dmu=0, trimProp=0.1, se='yuen')

Independent Samples Trimmed/Yuen Mean Test

A test to compare two means. The null hypothesis would be that the means of each category are equal in the population.

As the name implies a trimmed mean test, trims the data and then performs the test. This does mean it actually tests if trimmed means are equal, rather than the regular means. As a benefit, unlike the Student and Welch test, it doesn't require the assumption of normality. The regular trimmed mean test (Yuen-Dixon) is the variation of the Student test and does require the assumption of equal variances, while the Yuen-Welch variation doesn't.

The test is also described at PeterStatistics.com

Parameters

catField : dataframe or list: the categorical data
scaleField : dataframe or list: the scores
categories : list, optional: to indicate which two categories of catField to use, otherwise first two found will be used.
dmu : float, optional: difference according to null hypothesis (default is 0)
trimProp : float optional: proportion to trim in total for each category. If for example set to 0.1 then 0.05 from each side for each category will be trimmed. Default is 0.1.
se : {"yuen", "yuen-dixon"}, optional: to indicate which standard error to use. Default is "Yuen"

Returns

A dataframe with:

n cat. 1, the sample size of the first category
n cat. 2, the sample size of the second category
trim mean cat. 1, the sample mean of the first category
trim mean cat. 2, the sample mean of the second category
diff., difference between the two sample means
hyp. diff., hypothesized difference between the two population means
statistic, the test statistic (t-value)
df, the degrees of freedom
pValue, the significance (p-value)
test, name of test used

Notes

YUEN

The default se="yuen" will perform a Yuen-Welch test.

The formula used is (Yuen, 1974, p. 167): $t = \frac{\bar{x}_{t,1} - \bar{x}_{t,2}}{SE}$ $sig = 2\times\left(1 - T\left(\left|t\right|, df\right)\right)$

With: $SE = \sqrt{\frac{s_{w,1}^2}{m_1} + \frac{s_{w,2}^2}{m_2}}$ $s_{w,i}^2 = \frac{SSD_{w,i}}{m_i - 1}$ $df = \frac{1}{\frac{c^2}{m_1 - 1} + \frac{\left(1 - c\right)^2}{m_2 -1}}$ $c = \frac{\frac{s_{w,1}^2}{m_1}}{\frac{s_{w,1}^2}{m_1} + \frac{s_{w,2}^2}{m_2}}$ $\bar{x}_{t,i} = \frac{\sum_{j=g_i+1}^{n_i - g_i}y_{i,j}}{}$ $g_i = \lfloor n_i\times p_t\rfloor$ $m_i = n_ - 2\times g_i$ $SSD_{w,i} = g_i\times\left(y_{i,g_i+1} - \bar{x}_{wi}\right)^2 + g_i\times\left(y_{i,n_i-g_i} - \bar{x}_{w,i}\right)^2 + \sum_{j=g+1}^{n_i - g_i} \left(y_{i,j} - \bar{x}_{w,i}\right)^2$ $\bar{x}_{w,i} = \frac{\bar{x}_{t,i}\times m_i + g_i\times\left(y_{i, g_i+1} + y_{i, n_i-g_i}\right)}{n_i}$

Symbols used:

$x_{t,i}$ the trimmed mean of the scores in category i
$x_{w,i}$ The Winsorized mean of the scores in category i
$SSD_{w,i}$ the sum of squared deviations from the Winsorized mean of category i
$m_i$ the number of scores in the trimmed data set from category i
$y_{i,j}$ the j-th score after the scores in category i, after they are sorted from low to high
$p_t$ the proportion of trimming on each side, we can define

YUEN-DIXON

If se="yuen-dixon a trimmed means test will be performed.

The formula used is (Yuen & Dixon, 1973, p. 394): $t = \frac{\bar{x}_{t,1} - \bar{x}_{t,2}}{SE}$ $sig = 2\times\left(1 - T\left(\left|t\right|, df\right)\right)$

With: $SE = \sqrt{\frac{SSD_{w,1} + SSD_{w,2}}{m_1 + m_2 - 2}\times\left(\frac{1}{m_1} + \frac{1}{m_2}\right)}$ $df = m_1 + m_2 - 2$ $\bar{x}_{t,i} = \frac{\sum_{j=g_i+1}^{n_i - g_i}y_{i,j}}{}$ $g_i = \lfloor n_i\times p_t\rfloor$ $m_i = n_ - 2\times g_i$ $SSD_{w,i} = g_i\times\left(y_{i,g_i+1} - \bar{x}_{w,i}\right)^2 + g_i\times\left(y_{i,n_i-g_i} - \bar{x}_{w,i}\right)^2 + \sum_{j=g+1}^{n_i - g_i} \left(y_{i,j} - \bar{x}_{w,i}\right)^2$ $\bar{x}_{w,i} = \frac{\bar{x}_{t,i}\times m_i + g_i\times\left(y_{i, g_i+1} + y_{i, n_i-g_i}\right)}{n_i}$

Symbols used:

$x_{ti}$ the trimmed mean of the scores in category i
$x_{wi}$ The Winsorized mean of the scores in category i
$SSD_{wi}$ the sum of squared deviations from the Winsorized mean of category i
$m_i$ the number of scores in the trimmed data set from category i
$y_{i,j}$ the j-th score after the scores in category i, after they are sorted from low to high
$p_t$ the proportion of trimming on each side, we can define

Before, After and Alternatives

Before this you might want some descriptive measures. Use me_mode_bin for Mode for Binned Data, me_mean for different types of mean, and/or me_variation for different Measures of Quantitative Variation

Or a visualisation are vi_boxplot_single for a Box (and Whisker) Plot and vi_histogram for a Histogram

After the test you might want an effect size measure, options include: Common Language, Cohen d_s, Cohen U, Hedges g, Glass delta, biserial correlation, point-biserial correlation

There are four similar tests, with different assumptions.

test	equal variance	normality
Student t	yes	yes
Welch t	no	yes
Trimmed means	yes	no
Yuen-Welch	no	no

Another test that in some cases could be used is the Z test

References

Yuen, K. K. (1974). The two-sample trimmed t for unequal population variances. Biometrika, 61(1), 165–170. https://doi.org/10.1093/biomet/61.1.165

Yuen, K. K., & Dixon, W. J. (1973). The approximate behaviour and performance of the two-sample trimmed t. Biometrika, 60(2), 369–374. https://doi.org/10.2307/2334550

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: Dataframe

>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['age']
>>> ex1 = ex1.replace("89 OR OLDER", "90")
>>> ts_trimmed_mean_is(df1['sex'], ex1)
   n FEMALE  n MALE  trim mean FEMALE  trim mean MALE     diff.  hyp. diff.  statistic           df  p-value                                   test
0      1083     886         48.106667       47.274436  0.832231           0   0.971312  1707.922425  0.33153  Yuen-Welch independent samples t-test

Example 2: List

>>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
>>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
>>> ts_trimmed_mean_is(groups, scores)
   n int.  n nat.  trim mean int.  trim mean nat.  diff.  hyp. diff.  statistic        df   p-value                                   test
0      12       6       61.916667       41.666667  20.25           0    1.69314  9.750994  0.122075  Yuen-Welch independent samples t-test

Expand source code

def ts_trimmed_mean_is(catField, scaleField, categories=None, dmu=0, trimProp = 0.1, se = "yuen"):
    '''
    Independent Samples Trimmed/Yuen Mean Test
    ------------------------------------------
    A test to compare two means. The null hypothesis would be that the means of each category are equal in the population.
    
    As the name implies a trimmed mean test, trims the data and then performs the test. This does mean it actually tests if trimmed means are equal, rather than the regular means. As a benefit, unlike the Student and Welch test, it doesn't require the assumption of normality. The regular trimmed mean test (Yuen-Dixon) is the variation of the Student test and does require the assumption of equal variances, while the Yuen-Welch variation doesn't.

    The test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/TrimmedMeansIS.html)
    
    Parameters
    ----------
    catField : dataframe or list 
        the categorical data
    scaleField : dataframe or list
        the scores
    categories : list, optional 
        to indicate which two categories of catField to use, otherwise first two found will be used.
    dmu : float, optional 
        difference according to null hypothesis (default is 0)
    trimProp : float optional
        proportion to trim in total for each category. If for example set to 0.1 then 0.05 from each side for each category will be trimmed. Default is 0.1.
    se : {"yuen", "yuen-dixon"}, optional
        to indicate which standard error to use. Default is "Yuen"
        
    Returns
    -------
    A dataframe with:
    
    * *n cat. 1*, the sample size of the first category
    * *n cat. 2*, the sample size of the second category
    * *trim mean cat. 1*, the sample mean of the first category
    * *trim mean cat. 2*, the sample mean of the second category
    * *diff.*, difference between the two sample means
    * *hyp. diff.*, hypothesized difference between the two population means
    * *statistic*, the test statistic (t-value)
    * *df*, the degrees of freedom
    * *pValue*, the significance (p-value)
    * *test*, name of test used
    
    Notes
    -----
    **YUEN**
    
    The default `se="yuen"` will perform a Yuen-Welch test.
    
    The formula used is (Yuen, 1974, p. 167):
    $$t = \\frac{\\bar{x}_{t,1} - \\bar{x}_{t,2}}{SE}$$
    $$sig = 2\\times\\left(1 - T\\left(\\left|t\\right|, df\\right)\\right)$$
    
    With:
    $$SE = \\sqrt{\\frac{s_{w,1}^2}{m_1} + \\frac{s_{w,2}^2}{m_2}}$$
    $$s_{w,i}^2 = \\frac{SSD_{w,i}}{m_i - 1}$$
    $$df = \\frac{1}{\\frac{c^2}{m_1 - 1} + \\frac{\\left(1 - c\\right)^2}{m_2 -1}}$$
    $$c = \\frac{\\frac{s_{w,1}^2}{m_1}}{\\frac{s_{w,1}^2}{m_1} + \\frac{s_{w,2}^2}{m_2}}$$
    $$\\bar{x}_{t,i} = \\frac{\\sum_{j=g_i+1}^{n_i - g_i}y_{i,j}}{}$$
    $$g_i = \\lfloor n_i\\times p_t\\rfloor$$
    $$m_i = n_ - 2\\times g_i$$
    $$SSD_{w,i} = g_i\\times\\left(y_{i,g_i+1} - \\bar{x}_{wi}\\right)^2 + g_i\\times\\left(y_{i,n_i-g_i} - \\bar{x}_{w,i}\\right)^2 + \\sum_{j=g+1}^{n_i - g_i} \\left(y_{i,j} - \\bar{x}_{w,i}\\right)^2$$
    $$\\bar{x}_{w,i} = \\frac{\\bar{x}_{t,i}\\times m_i + g_i\\times\\left(y_{i, g_i+1} + y_{i, n_i-g_i}\\right)}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{t,i}\\) the trimmed mean of the scores in category i
    * \\(x_{w,i}\\) The Winsorized mean of the scores in category i
    * \\(SSD_{w,i}\\) the sum of squared deviations from the Winsorized mean of category i
    * \\(m_i\\) the number of scores in the trimmed data set from category i
    * \\(y_{i,j}\\) the j-th score after the scores in category i, after they are sorted from low to high
    * \\(p_t\\) the proportion of trimming on each side, we can define
    
    **YUEN-DIXON**
    
    If `se="yuen-dixon` a trimmed means test will be performed.
    
    The formula used is (Yuen & Dixon, 1973, p. 394):
    $$t = \\frac{\\bar{x}_{t,1} - \\bar{x}_{t,2}}{SE}$$
    $$sig = 2\\times\\left(1 - T\\left(\\left|t\\right|, df\\right)\\right)$$
    
    With:
    $$SE = \\sqrt{\\frac{SSD_{w,1} + SSD_{w,2}}{m_1 + m_2 - 2}\\times\\left(\\frac{1}{m_1} + \\frac{1}{m_2}\\right)}$$
    $$df = m_1 + m_2 - 2$$
    $$\\bar{x}_{t,i} = \\frac{\\sum_{j=g_i+1}^{n_i - g_i}y_{i,j}}{}$$
    $$g_i = \\lfloor n_i\\times p_t\\rfloor$$
    $$m_i = n_ - 2\\times g_i$$
    $$SSD_{w,i} = g_i\\times\\left(y_{i,g_i+1} - \\bar{x}_{w,i}\\right)^2 + g_i\\times\\left(y_{i,n_i-g_i} - \\bar{x}_{w,i}\\right)^2 + \\sum_{j=g+1}^{n_i - g_i} \\left(y_{i,j} - \\bar{x}_{w,i}\\right)^2$$
    $$\\bar{x}_{w,i} = \\frac{\\bar{x}_{t,i}\\times m_i + g_i\\times\\left(y_{i, g_i+1} + y_{i, n_i-g_i}\\right)}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{ti}\\) the trimmed mean of the scores in category i
    * \\(x_{wi}\\) The Winsorized mean of the scores in category i
    * \\(SSD_{wi}\\) the sum of squared deviations from the Winsorized mean of category i
    * \\(m_i\\) the number of scores in the trimmed data set from category i
    * \\(y_{i,j}\\) the j-th score after the scores in category i, after they are sorted from low to high
    * \\(p_t\\) the proportion of trimming on each side, we can define

    Before, After and Alternatives
    ------------------------------
    Before this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation
    
    Or a visualisation are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
    
    After the test you might want an effect size measure, options include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html), [biserial correlation](../correlations/cor_biserial.html), [point-biserial correlation](../effect_sizes/cor_point_biserial.html)
    
    There are four similar tests, with different assumptions. 
    
    |test|equal variance|normality|
    |-------|-----------|---------|
    |[Student t](../tests/test_student_t_is.html)| yes | yes|
    |[Welch t](../tests/test_welch_t_is.html) | no | yes|
    |[Trimmed means](../tests/test_trimmed_mean_is.html) | yes | no | 
    |[Yuen-Welch](../tests/test_trimmed_mean_is.html)|no | no |

    Another test that in some cases could be used is the [Z test](../tests/test_z_is.html)
    
    References
    ----------
    Yuen, K. K. (1974). The two-sample trimmed t for unequal population variances. *Biometrika, 61*(1), 165–170. https://doi.org/10.1093/biomet/61.1.165
    
    Yuen, K. K., & Dixon, W. J. (1973). The approximate behaviour and performance of the two-sample trimmed t. *Biometrika, 60*(2), 369–374. https://doi.org/10.2307/2334550
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Dataframe
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['age']
    >>> ex1 = ex1.replace("89 OR OLDER", "90")
    >>> ts_trimmed_mean_is(df1['sex'], ex1)
       n FEMALE  n MALE  trim mean FEMALE  trim mean MALE     diff.  hyp. diff.  statistic           df  p-value                                   test
    0      1083     886         48.106667       47.274436  0.832231           0   0.971312  1707.922425  0.33153  Yuen-Welch independent samples t-test
    
    Example 2: List
    >>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
    >>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
    >>> ts_trimmed_mean_is(groups, scores)
       n int.  n nat.  trim mean int.  trim mean nat.  diff.  hyp. diff.  statistic        df   p-value                                   test
    0      12       6       61.916667       41.666667  20.25           0    1.69314  9.750994  0.122075  Yuen-Welch independent samples t-test
    
    '''
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(scaleField) is list:
        scaleField = pd.Series(scaleField)
    
    #combine as one dataframe
    df = pd.concat([catField, scaleField], axis=1)
    df = df.dropna()
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    #make sure they are floats
    x1 = [float(x) for x in x1]
    x2 = [float(x) for x in x2]
    
    n1 = len(x1)
    n2 = len(x2)
    n = n1 + n2
    
    #number of scores to trim from each side
    g1 = floor(n1*trimProp/2)
    g2 = floor(n2*trimProp/2)
    n1t = n1 - 2*g1
    n2t = n2 - 2*g2
    
    #sort the results
    x1.sort()
    x2.sort()
    
    #trimmed means
    m1t = mean(x1[g1:(n1 - g1)])
    m2t = mean(x2[g2:(n2 - g2)])
    
    #Winsorize the data
    x1[0:g1] = [x1[g1]]*g1    
    x2[0:g2] = [x2[g2]]*g2
    
    x1[(n1 -g1):n1] = [x1[n1 - g1 -1]]*g1
    x2[(n2 -g2):n2] = [x2[n2 - g2 -1]]*g2
    
    #Variance of winsorized data
    var1 = variance(x1)
    var2 = variance(x2)
    
    if (se=="yuen"):
        ssw1 = var1*(n1-1)
        ssw2 = var2*(n2-1)
        var1w = ssw1 / (n1t - 1)
        var2w = ssw2 / (n2t - 1)
        se = (var1w / n1t + var2w / n2t)**0.5
        
        c = (var1w / n1t) / (var1w / n1t + var2w / n2t)
        df = 1 / (c**2 / (n1t - 1) + (1 - c)**2 / (n2t - 1))
        testUsed = "Yuen-Welch independent samples t-test"
    else:
        s2 = ((n1 - 1)*var1 + (n2 - 1)*var2)/((n1t - 1) + (n2t - 1))
        se = (s2 * (1/n1t + 1/n2t))**0.5
        df = n1t + n2t - 2
        testUsed = "Trimmed Mean independent samples t-test"
        
    tValue = (m1t - m2t - dmu) / se
    pValue = 2*(1-t.cdf(abs(tValue), df))
    statistic=tValue
    
    colnames = ["n "+cat1, "n "+cat2, "trim mean "+cat1, "trim mean "+cat2, "diff.", "hyp. diff.", "statistic", "df", "p-value", "test"]
    results = pd.DataFrame([[n1, n2, m1t, m2t, m1t - m2t, dmu, statistic, df, pValue, testUsed]], columns=colnames)
    
    return(results)