Module stikpetP.tests.test_freeman_tukey_gof
Expand source code
import pandas as pd
from scipy.stats import chi2
def ts_freeman_tukey_gof(data, expCounts=None, cc=None, modified=0):
'''
Freeman-Tukey Test of Goodness-of-Fit
-------------------------------------
A test that can be used with a single nominal variable, to test if the probabilities in all the categories are equal (the null hypothesis). If the test has a p-value below a pre-defined threshold (usually 0.05) the assumption they are all equal in the population will be rejected.
There are quite a few tests that can do this. Perhaps the most commonly used is the Pearson chi-square test, but also an exact multinomial, G-test, Neyman, Mod-Log Likelihood, Cressie-Read, and Freeman-Tukey-Read test are possible.
The Freeman-Tukey attempts to make the distribution more like a normal distribution by using a square root transformation.
Lawal (1984) continued some work from Larntz (1978) and compared the modified Freeman-Tukey, G-test and the Pearson chi-square test, and concluded that for small samples the Pearson test is preferred, while for large samples either the Pearson or G-test. Making this Freeman-Tukey test perhaps somewhat redundant.
This function is shown in this [YouTube video](https://youtu.be/x_dzouxszKI) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/Freeman-Tukey.html)
Parameters
----------
data : list or pandas data series
the data
expCount : pandas dataframe, optional
the categories and expected counts
cc : {None, "yates", "yates2", "pearson", "williams"}, optional
which continuity correction to use. Default is None
modified : int, optional
indicate the use of the modified version. Either 0 (default), 1 or 2 (see notes)
Returns
-------
pandas.DataFrame
A dataframe with the following columns:
* *n*, the sample size
* *k*, the number of categories
* *statistic*, the test statistic (chi-square value)
* *df*, degrees of freedom
* *p-value*, significance (p-value)
* *minExp*, the minimum expected count
* *percBelow5*, the percentage of categories with an expected count below 5
* *test used*, description of the test used
Notes
-----
The formula used is (Ayinde & Abidoye, 2010, p. 21):
$$\\chi_{FT}^{2}=4\\times\\sum_{i=1}^{k}\\left(\\sqrt{F_{i}} - \\sqrt{E_{i}}\\right)^2$$
$$df = k - 1$$
$$sig. = 1 - \\chi^2\\left(\\chi_{FT}^{2},df\\right)$$
With:
$$n = \\sum_{i=1}^k F_i$$
If no expected counts provided:
$$E_i = \\frac{n}{k}$$
else:
$$E_i = n\\times\\frac{E_{p_i}}{n_p}$$
$$n_p = \\sum_{i=1}^k E_{p_i}$$
A modified version uses another possible smoothing (Bishop, 1969, p. 284; Larntz, 1978, p.253):
$$\\chi_{MFT}^{2} = \\sum_{i=1}^{k}\\left(\\sqrt{F_{i}} + \\sqrt{F_{i} + 1} - \\sqrt{4\\times E_{i} + 1}\\right)^2$$
Or slightly different (Read & Cressie, 1988, p. 82):
$$\\chi_{MFT}^{2} = \\sum_{i=1}^{k}\\left(\\sqrt{F_{i}} + \\sqrt{F_{i} + 1} - \\sqrt{4\\times \\left(E_{i} + 1\\right)}\\right)^2$$
*Symbols used:*
* $k$ the number of categories
* $F_i$ the (absolute) frequency of category i
* $E_i$ the expected frequency of category i
* $E_{p_i}$ the provided expected frequency of category i
* $n$ the sample size, i.e. the sum of all frequencies
* $n_p$ the sum of all provided expected counts
* $\\chi^2\\left(\\dots\\right)$ the chi-square cumulative density function
The test is attributed to Freeman and Tukey (1950), but couldn't really find it in there. Another source often mentioned is Bishop et al. (2007)
The Yates continuity correction (cc="yates") is calculated using (Yates, 1934, p. 222):
$$F_i^\\ast = \\begin{cases} F_i - 0.5 & \\text{ if } F_i > E_i \\\\ F_i + 0.5 & \\text{ if } F_i < E_i \\\\ F_i & \\text{ if } F_i = E_i \\end{cases}$$
In some cases the Yates correction is slightly changed to (yates2) (Allen, 1990, p. 523):
$$F_i^\\ast = \\begin{cases} F_i - 0.5 & \\text{ if } F_i - 0.5 > E_i \\\\ F_i + 0.5 & \\text{ if } F_i + 0.5 < E_i \\\\ F_i & \\text{ else } \\end{cases}$$
Note that the Yates correction is usually only considered if there are only two categories. Some also argue this correction is too conservative (see for details Haviland (1990)).
The Pearson correction (cc="pearson") is calculated using (E.S. Pearson, 1947, p. 157):
$$\\chi_{PP}^2 = \\chi_{FT}^{2}\\times\\frac{n - 1}{n}$$
The Williams correction (cc="williams") is calculated using (Williams, 1976, p. 36):
$$\\chi_{PW}^2 = \\frac{\\chi_{FT}^{2}}{q}$$
With:
$$q = 1 + \\frac{k^2 - 1}{6\\times n\\times df}$$
The formula is also used by McDonald (2014, p. 87)
Before, After and Alternatives
------------------------------
Before this an impression using a frequency table or a visualisation might be helpful:
* [tab_frequency](../other/table_frequency.html#tab_frequency)
* [vi_bar_simple](../visualisations/vis_bar_simple.html#vi_bar_simple) for Simple Bar Chart
* [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot
* [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot
* [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart
* [vi_pie](../visualisations/vis_pie.html#vi_pie) for Pie Chart
After this you might an effect size measure:
* [es_cohen_w](../effect_sizes/eff_size_cohen_w.html#es_cohen_w) for Cohen w
* [es_cramer_v_gof](../effect_sizes/eff_size_cramer_v_gof.html#es_cramer_v_gof) for Cramer's V for Goodness-of-Fit
* [es_fei](../effect_sizes/eff_size_fei.html#es_fei) for Fei
* [es_jbm_e](../effect_sizes/eff_size_jbm_e.html#es_jbm_e) for Johnston-Berry-Mielke E
or perform a post-hoc test:
* [ph_pairwise_bin](../other/poho_pairwise_bin.html#ph_pairwise_bin) for Pairwise Binary Tests
* [ph_pairwise_gof](../other/poho_pairwise_gof.html#ph_pairwise_gof) for Pairwise Goodness-of-Fit Tests
* [ph_residual_gof_bin](../other/poho_residual_gof_bin.html#ph_residual_gof_bin) for Residuals Tests using Binary tests
* [ph_residual_gof_gof](../other/poho_residual_gof_gof.html#ph_residual_gof_gof) for Residuals Using Goodness-of-Fit Tests
Alternative tests:
* [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test
* [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit
* [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test
* [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit
* [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test
* [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit
* [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test
References
----------
Allen, A. O. (1990). *Probability, statistics, and queueing theory with computer science applications* (2nd ed.). Academic Press.
Ayinde, K., & Abidoye, A. O. (2010). Simplified Freeman-Tukey test statistics for testing probabilities in contingency tables. *Science World Journal, 2*(2), 21–27. doi:10.4314/swj.v2i2.51730
Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (2007). *Discrete multivariate analysis*. Springer.
Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (2007). *Discrete multivariate analysis*. Springer.
Freeman, M. F., & Tukey, J. W. (1950). Transformations Related to the angular and the square root. *The Annals of Mathematical Statistics, 21*(4), 607–611. doi:10.1214/aoms/1177729756
Haviland, M. G. (1990). Yates’s correction for continuity and the analysis of 2 × 2 contingency tables. *Statistics in Medicine, 9*(4), 363–367. doi:10.1002/sim.4780090403
Larntz, K. (1978). Small-sample comparisons of exact levels for chi-squared goodness-of-fit statistics. *Journal of the American Statistical Association, 73*(362), 253–263. doi:10.1080/01621459.1978.10481567
Lawal, H. B. (1984). Comparisons of the X 2 , Y 2 , Freeman-Tukey and Williams’s improved G 2 test statistics in small samples of one-way multinomials. *Biometrika, 71*(2), 415–418. doi:10.2307/2336263
McDonald, J. H. (2014). *Handbook of biological statistics* (3rd ed.). Sparky House Publishing.
Pearson, E. S. (1947). The choice of statistical tests illustrated on the Interpretation of data classed in a 2 × 2 table. *Biometrika, 34*(1/2), 139–167. doi:10.2307/2332518
Read, T. R. C., & Cressie, N. A. C. (1988). Goodness-of-fit statistics for discrete multivariate data. Springer-Verlag.
Williams, D. A. (1976). Improved likelihood ratio tests for complete contingency tables. *Biometrika, 63*(1), 33–37. doi:10.2307/2335081
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
---------
>>> pd.set_option('display.width',1000)
>>> pd.set_option('display.max_columns', 1000)
Example 1: pandas series
>>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['mar1']
>>> ts_freeman_tukey_gof(ex1)
n k statistic df p-value minExp percBelow5 test used
0 1941 5 1166.495639 4 2.919359e-251 388.2 0.0 Freeman-Tukey test of goodness-of-fit
Example 2: pandas series with various settings
>>> ex2 = df1['mar1']
>>> eCounts = pd.DataFrame({'category' : ["MARRIED", "DIVORCED", "NEVER MARRIED", "SEPARATED"], 'count' : [5,5,5,5]})
>>> ts_freeman_tukey_gof(ex2, expCounts=eCounts, cc="yates")
n k statistic df p-value minExp percBelow5 test used
0 1760 4 1044.117361 3 4.836441e-226 440.0 0.0 Freeman-Tukey test of goodness-of-fit, and Yates correction
>>> ts_freeman_tukey_gof(ex2, expCounts=eCounts, cc="pearson")
n k statistic df p-value minExp percBelow5 test used
0 1760 4 1047.365448 3 9.547419e-227 440.0 0.0 Freeman-Tukey test of goodness-of-fit, and Pearson correction
>>> ts_freeman_tukey_gof(ex2, expCounts=eCounts, cc="williams")
n k statistic df p-value minExp percBelow5 test used
0 1760 4 1047.464921 3 9.084607e-227 440.0 0.0 Freeman-Tukey test of goodness-of-fit, and Williams correction
Example 3: a list
>>> ex3 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"]
>>> ts_freeman_tukey_gof(ex3)
n k statistic df p-value minExp percBelow5 test used
0 19 4 3.632589 3 0.303969 4.75 100.0 Freeman-Tukey test of goodness-of-fit
'''
if type(data) == list:
data = pd.Series(data)
#Set correction factor to 1 (no correction)
corFactor = 1
if modified:
testUsed = "modified Freeman-Tukey test of goodness-of-fit"
else:
testUsed = "Freeman-Tukey test of goodness-of-fit"
#The test itself
freqs = data.value_counts()
k = len(freqs)
#Determine expected counts if not provided
if expCounts is None:
expCounts = [sum(freqs)/len(freqs)]* k
expCounts = pd.Series(expCounts, index=list(freqs.index.values))
else:
#if expected counts are provided
ne = 0
k = len(expCounts)
#determine sample size of expected counts
for i in range(0,k):
ne = ne + expCounts.iloc[i,1]
#remove categories not provided from observed counts
for i in freqs.index:
if i not in list(expCounts.iloc[:,0]):
freqs = freqs.drop(i)
# and sort based on the index
freqs = freqs.sort_index()
#set the column names
expCounts.columns = ["category", "count"]
#sort the expected counts
expCounts.sort_values(by="category", inplace=True)
#adjust based on observed count total
expCounts['count'] = expCounts['count'].astype('float64')
n = sum(freqs)
for i in range(0,k):
expCounts.at[i, 'count'] = float(expCounts.at[i, 'count'] / ne * n)
expCounts = pd.Series(expCounts.iloc[:, 1])
n = sum(freqs)
df = k - 1
#set williams correction factor
if cc=="williams":
corFactor = 1/(1 + (k**2 - 1)/(6*n*df))
testUsed = testUsed + ", and Williams correction"
#adjust frequencies if Yates correction is requested
if cc=="yates":
k = len(freqs)
adjFreq = list(freqs).copy()
for i in range(0, k):
if adjFreq[i] > expCounts.iloc[i]:
adjFreq[i] = adjFreq[i] - 0.5
elif adjFreq[i] < expCounts.iloc[i]:
adjFreq[i] = adjFreq[i] + 0.5
freqs = pd.Series(adjFreq, index=list(freqs.index.values))
testUsed = testUsed + ", and Yates correction"
if cc=="yates2":
k = len(freqs)
adjFreq = list(freqs).copy()
for i in range(0, k):
if adjFreq[i] - 0.5 > expCounts.iloc[i]:
adjFreq[i] = adjFreq[i] - 0.5
elif adjFreq[i] + 0.5 < expCounts.iloc[i]:
adjFreq[i] = adjFreq[i] + 0.5
freqs = pd.Series(adjFreq, index=list(freqs.index.values))
testUsed = testUsed + ", and Yates correction"
#determine the test statistic
if modified==1:
ts = sum([(freqs.iloc[i]**0.5 + (freqs.iloc[i]+1)**0.5 - (4*expCounts.iloc[i] + 1)**0.5)**2 for i in range(0,k)])
if modified==2:
ts = sum([(freqs.iloc[i]**0.5 + (freqs.iloc[i]+1)**0.5 - (4*(expCounts.iloc[i] + 1))**0.5)**2 for i in range(0,k)])
elif modified==0:
ts = 4*sum([(freqs.iloc[i]**0.5 - expCounts.iloc[i]**0.5)**2 for i in range(0,k)])
#set E.S. Pearson correction
if cc=="pearson":
corFactor = (n - 1)/n
testUsed = testUsed + ", and Pearson correction"
#Adjust test statistic
ts = ts*corFactor
#Determine p-value
pVal = chi2.sf(ts, df)
#Check minimum expected counts
#Cells with expected count less than 5
nbelow = len([x for x in expCounts if x < 5])
#Number of cells
ncells = len(expCounts)
#As proportion
pBelow = nbelow/ncells
#the minimum expected count
minExp = min(expCounts)
#prepare results
testResults = pd.DataFrame([[n, k, ts, df, pVal, minExp, pBelow*100, testUsed]], columns=["n", "k","statistic", "df", "p-value", "minExp", "percBelow5", "test used"])
pd.set_option('display.max_colwidth', None)
return testResults
Functions
def ts_freeman_tukey_gof(data, expCounts=None, cc=None, modified=0)
-
Freeman-Tukey Test of Goodness-of-Fit
A test that can be used with a single nominal variable, to test if the probabilities in all the categories are equal (the null hypothesis). If the test has a p-value below a pre-defined threshold (usually 0.05) the assumption they are all equal in the population will be rejected.
There are quite a few tests that can do this. Perhaps the most commonly used is the Pearson chi-square test, but also an exact multinomial, G-test, Neyman, Mod-Log Likelihood, Cressie-Read, and Freeman-Tukey-Read test are possible.
The Freeman-Tukey attempts to make the distribution more like a normal distribution by using a square root transformation.
Lawal (1984) continued some work from Larntz (1978) and compared the modified Freeman-Tukey, G-test and the Pearson chi-square test, and concluded that for small samples the Pearson test is preferred, while for large samples either the Pearson or G-test. Making this Freeman-Tukey test perhaps somewhat redundant.
This function is shown in this YouTube video and the test is also described at PeterStatistics.com
Parameters
data
:list
orpandas data series
- the data
expCount
:pandas dataframe
, optional- the categories and expected counts
cc
:{None, "yates", "yates2", "pearson", "williams"}
, optional- which continuity correction to use. Default is None
modified
:int
, optional- indicate the use of the modified version. Either 0 (default), 1 or 2 (see notes)
Returns
pandas.DataFrame
-
A dataframe with the following columns:
- n, the sample size
- k, the number of categories
- statistic, the test statistic (chi-square value)
- df, degrees of freedom
- p-value, significance (p-value)
- minExp, the minimum expected count
- percBelow5, the percentage of categories with an expected count below 5
- test used, description of the test used
Notes
The formula used is (Ayinde & Abidoye, 2010, p. 21): \chi_{FT}^{2}=4\times\sum_{i=1}^{k}\left(\sqrt{F_{i}} - \sqrt{E_{i}}\right)^2 df = k - 1 sig. = 1 - \chi^2\left(\chi_{FT}^{2},df\right)
With: n = \sum_{i=1}^k F_i
If no expected counts provided: E_i = \frac{n}{k} else: E_i = n\times\frac{E_{p_i}}{n_p} n_p = \sum_{i=1}^k E_{p_i}
A modified version uses another possible smoothing (Bishop, 1969, p. 284; Larntz, 1978, p.253): \chi_{MFT}^{2} = \sum_{i=1}^{k}\left(\sqrt{F_{i}} + \sqrt{F_{i} + 1} - \sqrt{4\times E_{i} + 1}\right)^2
Or slightly different (Read & Cressie, 1988, p. 82): \chi_{MFT}^{2} = \sum_{i=1}^{k}\left(\sqrt{F_{i}} + \sqrt{F_{i} + 1} - \sqrt{4\times \left(E_{i} + 1\right)}\right)^2
Symbols used:
- $k$ the number of categories
- $F_i$ the (absolute) frequency of category i
- $E_i$ the expected frequency of category i
- $E_{p_i}$ the provided expected frequency of category i
- $n$ the sample size, i.e. the sum of all frequencies
- $n_p$ the sum of all provided expected counts
- $\chi^2\left(\dots\right)$ the chi-square cumulative density function
The test is attributed to Freeman and Tukey (1950), but couldn't really find it in there. Another source often mentioned is Bishop et al. (2007)
The Yates continuity correction (cc="yates") is calculated using (Yates, 1934, p. 222): F_i^\ast = \begin{cases} F_i - 0.5 & \text{ if } F_i > E_i \\ F_i + 0.5 & \text{ if } F_i < E_i \\ F_i & \text{ if } F_i = E_i \end{cases}
In some cases the Yates correction is slightly changed to (yates2) (Allen, 1990, p. 523): F_i^\ast = \begin{cases} F_i - 0.5 & \text{ if } F_i - 0.5 > E_i \\ F_i + 0.5 & \text{ if } F_i + 0.5 < E_i \\ F_i & \text{ else } \end{cases}
Note that the Yates correction is usually only considered if there are only two categories. Some also argue this correction is too conservative (see for details Haviland (1990)).
The Pearson correction (cc="pearson") is calculated using (E.S. Pearson, 1947, p. 157): \chi_{PP}^2 = \chi_{FT}^{2}\times\frac{n - 1}{n}
The Williams correction (cc="williams") is calculated using (Williams, 1976, p. 36): \chi_{PW}^2 = \frac{\chi_{FT}^{2}}{q}
With: q = 1 + \frac{k^2 - 1}{6\times n\times df}
The formula is also used by McDonald (2014, p. 87)
Before, After and Alternatives
Before this an impression using a frequency table or a visualisation might be helpful: * tab_frequency * vi_bar_simple for Simple Bar Chart * vi_cleveland_dot_plot for Cleveland Dot Plot * vi_dot_plot for Dot Plot * vi_pareto_chart for Pareto Chart * vi_pie for Pie Chart
After this you might an effect size measure: * es_cohen_w for Cohen w * es_cramer_v_gof for Cramer's V for Goodness-of-Fit * es_fei for Fei * es_jbm_e for Johnston-Berry-Mielke E
or perform a post-hoc test: * ph_pairwise_bin for Pairwise Binary Tests * ph_pairwise_gof for Pairwise Goodness-of-Fit Tests * ph_residual_gof_bin for Residuals Tests using Binary tests * ph_residual_gof_gof for Residuals Using Goodness-of-Fit Tests
Alternative tests: * ts_pearson_gof for Pearson Chi-Square Goodness-of-Fit Test * ts_freeman_tukey_read for Freeman-Tukey-Read Test of Goodness-of-Fit * ts_g_gof for G (Likelihood Ratio) Goodness-of-Fit Test * ts_mod_log_likelihood_gof for Mod-Log Likelihood Test of Goodness-of-Fit * ts_multinomial_gof for Multinomial Goodness-of-Fit Test * ts_neyman_gof for Neyman Test of Goodness-of-Fit * ts_powerdivergence_gof for Power Divergence GoF Test
References
Allen, A. O. (1990). Probability, statistics, and queueing theory with computer science applications (2nd ed.). Academic Press.
Ayinde, K., & Abidoye, A. O. (2010). Simplified Freeman-Tukey test statistics for testing probabilities in contingency tables. Science World Journal, 2(2), 21–27. doi:10.4314/swj.v2i2.51730
Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (2007). Discrete multivariate analysis. Springer.
Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (2007). Discrete multivariate analysis. Springer.
Freeman, M. F., & Tukey, J. W. (1950). Transformations Related to the angular and the square root. The Annals of Mathematical Statistics, 21(4), 607–611. doi:10.1214/aoms/1177729756
Haviland, M. G. (1990). Yates’s correction for continuity and the analysis of 2 × 2 contingency tables. Statistics in Medicine, 9(4), 363–367. doi:10.1002/sim.4780090403
Larntz, K. (1978). Small-sample comparisons of exact levels for chi-squared goodness-of-fit statistics. Journal of the American Statistical Association, 73(362), 253–263. doi:10.1080/01621459.1978.10481567
Lawal, H. B. (1984). Comparisons of the X 2 , Y 2 , Freeman-Tukey and Williams’s improved G 2 test statistics in small samples of one-way multinomials. Biometrika, 71(2), 415–418. doi:10.2307/2336263
McDonald, J. H. (2014). Handbook of biological statistics (3rd ed.). Sparky House Publishing.
Pearson, E. S. (1947). The choice of statistical tests illustrated on the Interpretation of data classed in a 2 × 2 table. Biometrika, 34(1/2), 139–167. doi:10.2307/2332518
Read, T. R. C., & Cressie, N. A. C. (1988). Goodness-of-fit statistics for discrete multivariate data. Springer-Verlag.
Williams, D. A. (1976). Improved likelihood ratio tests for complete contingency tables. Biometrika, 63(1), 33–37. doi:10.2307/2335081
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000)
Example 1: pandas series
>>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df1['mar1'] >>> ts_freeman_tukey_gof(ex1) n k statistic df p-value minExp percBelow5 test used 0 1941 5 1166.495639 4 2.919359e-251 388.2 0.0 Freeman-Tukey test of goodness-of-fit
Example 2: pandas series with various settings
>>> ex2 = df1['mar1'] >>> eCounts = pd.DataFrame({'category' : ["MARRIED", "DIVORCED", "NEVER MARRIED", "SEPARATED"], 'count' : [5,5,5,5]}) >>> ts_freeman_tukey_gof(ex2, expCounts=eCounts, cc="yates") n k statistic df p-value minExp percBelow5 test used 0 1760 4 1044.117361 3 4.836441e-226 440.0 0.0 Freeman-Tukey test of goodness-of-fit, and Yates correction
>>> ts_freeman_tukey_gof(ex2, expCounts=eCounts, cc="pearson") n k statistic df p-value minExp percBelow5 test used 0 1760 4 1047.365448 3 9.547419e-227 440.0 0.0 Freeman-Tukey test of goodness-of-fit, and Pearson correction >>> ts_freeman_tukey_gof(ex2, expCounts=eCounts, cc="williams") n k statistic df p-value minExp percBelow5 test used 0 1760 4 1047.464921 3 9.084607e-227 440.0 0.0 Freeman-Tukey test of goodness-of-fit, and Williams correction
Example 3: a list
>>> ex3 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"] >>> ts_freeman_tukey_gof(ex3) n k statistic df p-value minExp percBelow5 test used 0 19 4 3.632589 3 0.303969 4.75 100.0 Freeman-Tukey test of goodness-of-fit
Expand source code
def ts_freeman_tukey_gof(data, expCounts=None, cc=None, modified=0): ''' Freeman-Tukey Test of Goodness-of-Fit ------------------------------------- A test that can be used with a single nominal variable, to test if the probabilities in all the categories are equal (the null hypothesis). If the test has a p-value below a pre-defined threshold (usually 0.05) the assumption they are all equal in the population will be rejected. There are quite a few tests that can do this. Perhaps the most commonly used is the Pearson chi-square test, but also an exact multinomial, G-test, Neyman, Mod-Log Likelihood, Cressie-Read, and Freeman-Tukey-Read test are possible. The Freeman-Tukey attempts to make the distribution more like a normal distribution by using a square root transformation. Lawal (1984) continued some work from Larntz (1978) and compared the modified Freeman-Tukey, G-test and the Pearson chi-square test, and concluded that for small samples the Pearson test is preferred, while for large samples either the Pearson or G-test. Making this Freeman-Tukey test perhaps somewhat redundant. This function is shown in this [YouTube video](https://youtu.be/x_dzouxszKI) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/Freeman-Tukey.html) Parameters ---------- data : list or pandas data series the data expCount : pandas dataframe, optional the categories and expected counts cc : {None, "yates", "yates2", "pearson", "williams"}, optional which continuity correction to use. Default is None modified : int, optional indicate the use of the modified version. Either 0 (default), 1 or 2 (see notes) Returns ------- pandas.DataFrame A dataframe with the following columns: * *n*, the sample size * *k*, the number of categories * *statistic*, the test statistic (chi-square value) * *df*, degrees of freedom * *p-value*, significance (p-value) * *minExp*, the minimum expected count * *percBelow5*, the percentage of categories with an expected count below 5 * *test used*, description of the test used Notes ----- The formula used is (Ayinde & Abidoye, 2010, p. 21): $$\\chi_{FT}^{2}=4\\times\\sum_{i=1}^{k}\\left(\\sqrt{F_{i}} - \\sqrt{E_{i}}\\right)^2$$ $$df = k - 1$$ $$sig. = 1 - \\chi^2\\left(\\chi_{FT}^{2},df\\right)$$ With: $$n = \\sum_{i=1}^k F_i$$ If no expected counts provided: $$E_i = \\frac{n}{k}$$ else: $$E_i = n\\times\\frac{E_{p_i}}{n_p}$$ $$n_p = \\sum_{i=1}^k E_{p_i}$$ A modified version uses another possible smoothing (Bishop, 1969, p. 284; Larntz, 1978, p.253): $$\\chi_{MFT}^{2} = \\sum_{i=1}^{k}\\left(\\sqrt{F_{i}} + \\sqrt{F_{i} + 1} - \\sqrt{4\\times E_{i} + 1}\\right)^2$$ Or slightly different (Read & Cressie, 1988, p. 82): $$\\chi_{MFT}^{2} = \\sum_{i=1}^{k}\\left(\\sqrt{F_{i}} + \\sqrt{F_{i} + 1} - \\sqrt{4\\times \\left(E_{i} + 1\\right)}\\right)^2$$ *Symbols used:* * $k$ the number of categories * $F_i$ the (absolute) frequency of category i * $E_i$ the expected frequency of category i * $E_{p_i}$ the provided expected frequency of category i * $n$ the sample size, i.e. the sum of all frequencies * $n_p$ the sum of all provided expected counts * $\\chi^2\\left(\\dots\\right)$ the chi-square cumulative density function The test is attributed to Freeman and Tukey (1950), but couldn't really find it in there. Another source often mentioned is Bishop et al. (2007) The Yates continuity correction (cc="yates") is calculated using (Yates, 1934, p. 222): $$F_i^\\ast = \\begin{cases} F_i - 0.5 & \\text{ if } F_i > E_i \\\\ F_i + 0.5 & \\text{ if } F_i < E_i \\\\ F_i & \\text{ if } F_i = E_i \\end{cases}$$ In some cases the Yates correction is slightly changed to (yates2) (Allen, 1990, p. 523): $$F_i^\\ast = \\begin{cases} F_i - 0.5 & \\text{ if } F_i - 0.5 > E_i \\\\ F_i + 0.5 & \\text{ if } F_i + 0.5 < E_i \\\\ F_i & \\text{ else } \\end{cases}$$ Note that the Yates correction is usually only considered if there are only two categories. Some also argue this correction is too conservative (see for details Haviland (1990)). The Pearson correction (cc="pearson") is calculated using (E.S. Pearson, 1947, p. 157): $$\\chi_{PP}^2 = \\chi_{FT}^{2}\\times\\frac{n - 1}{n}$$ The Williams correction (cc="williams") is calculated using (Williams, 1976, p. 36): $$\\chi_{PW}^2 = \\frac{\\chi_{FT}^{2}}{q}$$ With: $$q = 1 + \\frac{k^2 - 1}{6\\times n\\times df}$$ The formula is also used by McDonald (2014, p. 87) Before, After and Alternatives ------------------------------ Before this an impression using a frequency table or a visualisation might be helpful: * [tab_frequency](../other/table_frequency.html#tab_frequency) * [vi_bar_simple](../visualisations/vis_bar_simple.html#vi_bar_simple) for Simple Bar Chart * [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot * [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot * [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart * [vi_pie](../visualisations/vis_pie.html#vi_pie) for Pie Chart After this you might an effect size measure: * [es_cohen_w](../effect_sizes/eff_size_cohen_w.html#es_cohen_w) for Cohen w * [es_cramer_v_gof](../effect_sizes/eff_size_cramer_v_gof.html#es_cramer_v_gof) for Cramer's V for Goodness-of-Fit * [es_fei](../effect_sizes/eff_size_fei.html#es_fei) for Fei * [es_jbm_e](../effect_sizes/eff_size_jbm_e.html#es_jbm_e) for Johnston-Berry-Mielke E or perform a post-hoc test: * [ph_pairwise_bin](../other/poho_pairwise_bin.html#ph_pairwise_bin) for Pairwise Binary Tests * [ph_pairwise_gof](../other/poho_pairwise_gof.html#ph_pairwise_gof) for Pairwise Goodness-of-Fit Tests * [ph_residual_gof_bin](../other/poho_residual_gof_bin.html#ph_residual_gof_bin) for Residuals Tests using Binary tests * [ph_residual_gof_gof](../other/poho_residual_gof_gof.html#ph_residual_gof_gof) for Residuals Using Goodness-of-Fit Tests Alternative tests: * [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test * [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit * [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test * [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit * [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test * [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit * [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test References ---------- Allen, A. O. (1990). *Probability, statistics, and queueing theory with computer science applications* (2nd ed.). Academic Press. Ayinde, K., & Abidoye, A. O. (2010). Simplified Freeman-Tukey test statistics for testing probabilities in contingency tables. *Science World Journal, 2*(2), 21–27. doi:10.4314/swj.v2i2.51730 Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (2007). *Discrete multivariate analysis*. Springer. Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (2007). *Discrete multivariate analysis*. Springer. Freeman, M. F., & Tukey, J. W. (1950). Transformations Related to the angular and the square root. *The Annals of Mathematical Statistics, 21*(4), 607–611. doi:10.1214/aoms/1177729756 Haviland, M. G. (1990). Yates’s correction for continuity and the analysis of 2 × 2 contingency tables. *Statistics in Medicine, 9*(4), 363–367. doi:10.1002/sim.4780090403 Larntz, K. (1978). Small-sample comparisons of exact levels for chi-squared goodness-of-fit statistics. *Journal of the American Statistical Association, 73*(362), 253–263. doi:10.1080/01621459.1978.10481567 Lawal, H. B. (1984). Comparisons of the X 2 , Y 2 , Freeman-Tukey and Williams’s improved G 2 test statistics in small samples of one-way multinomials. *Biometrika, 71*(2), 415–418. doi:10.2307/2336263 McDonald, J. H. (2014). *Handbook of biological statistics* (3rd ed.). Sparky House Publishing. Pearson, E. S. (1947). The choice of statistical tests illustrated on the Interpretation of data classed in a 2 × 2 table. *Biometrika, 34*(1/2), 139–167. doi:10.2307/2332518 Read, T. R. C., & Cressie, N. A. C. (1988). Goodness-of-fit statistics for discrete multivariate data. Springer-Verlag. Williams, D. A. (1976). Improved likelihood ratio tests for complete contingency tables. *Biometrika, 63*(1), 33–37. doi:10.2307/2335081 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples --------- >>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) Example 1: pandas series >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df1['mar1'] >>> ts_freeman_tukey_gof(ex1) n k statistic df p-value minExp percBelow5 test used 0 1941 5 1166.495639 4 2.919359e-251 388.2 0.0 Freeman-Tukey test of goodness-of-fit Example 2: pandas series with various settings >>> ex2 = df1['mar1'] >>> eCounts = pd.DataFrame({'category' : ["MARRIED", "DIVORCED", "NEVER MARRIED", "SEPARATED"], 'count' : [5,5,5,5]}) >>> ts_freeman_tukey_gof(ex2, expCounts=eCounts, cc="yates") n k statistic df p-value minExp percBelow5 test used 0 1760 4 1044.117361 3 4.836441e-226 440.0 0.0 Freeman-Tukey test of goodness-of-fit, and Yates correction >>> ts_freeman_tukey_gof(ex2, expCounts=eCounts, cc="pearson") n k statistic df p-value minExp percBelow5 test used 0 1760 4 1047.365448 3 9.547419e-227 440.0 0.0 Freeman-Tukey test of goodness-of-fit, and Pearson correction >>> ts_freeman_tukey_gof(ex2, expCounts=eCounts, cc="williams") n k statistic df p-value minExp percBelow5 test used 0 1760 4 1047.464921 3 9.084607e-227 440.0 0.0 Freeman-Tukey test of goodness-of-fit, and Williams correction Example 3: a list >>> ex3 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"] >>> ts_freeman_tukey_gof(ex3) n k statistic df p-value minExp percBelow5 test used 0 19 4 3.632589 3 0.303969 4.75 100.0 Freeman-Tukey test of goodness-of-fit ''' if type(data) == list: data = pd.Series(data) #Set correction factor to 1 (no correction) corFactor = 1 if modified: testUsed = "modified Freeman-Tukey test of goodness-of-fit" else: testUsed = "Freeman-Tukey test of goodness-of-fit" #The test itself freqs = data.value_counts() k = len(freqs) #Determine expected counts if not provided if expCounts is None: expCounts = [sum(freqs)/len(freqs)]* k expCounts = pd.Series(expCounts, index=list(freqs.index.values)) else: #if expected counts are provided ne = 0 k = len(expCounts) #determine sample size of expected counts for i in range(0,k): ne = ne + expCounts.iloc[i,1] #remove categories not provided from observed counts for i in freqs.index: if i not in list(expCounts.iloc[:,0]): freqs = freqs.drop(i) # and sort based on the index freqs = freqs.sort_index() #set the column names expCounts.columns = ["category", "count"] #sort the expected counts expCounts.sort_values(by="category", inplace=True) #adjust based on observed count total expCounts['count'] = expCounts['count'].astype('float64') n = sum(freqs) for i in range(0,k): expCounts.at[i, 'count'] = float(expCounts.at[i, 'count'] / ne * n) expCounts = pd.Series(expCounts.iloc[:, 1]) n = sum(freqs) df = k - 1 #set williams correction factor if cc=="williams": corFactor = 1/(1 + (k**2 - 1)/(6*n*df)) testUsed = testUsed + ", and Williams correction" #adjust frequencies if Yates correction is requested if cc=="yates": k = len(freqs) adjFreq = list(freqs).copy() for i in range(0, k): if adjFreq[i] > expCounts.iloc[i]: adjFreq[i] = adjFreq[i] - 0.5 elif adjFreq[i] < expCounts.iloc[i]: adjFreq[i] = adjFreq[i] + 0.5 freqs = pd.Series(adjFreq, index=list(freqs.index.values)) testUsed = testUsed + ", and Yates correction" if cc=="yates2": k = len(freqs) adjFreq = list(freqs).copy() for i in range(0, k): if adjFreq[i] - 0.5 > expCounts.iloc[i]: adjFreq[i] = adjFreq[i] - 0.5 elif adjFreq[i] + 0.5 < expCounts.iloc[i]: adjFreq[i] = adjFreq[i] + 0.5 freqs = pd.Series(adjFreq, index=list(freqs.index.values)) testUsed = testUsed + ", and Yates correction" #determine the test statistic if modified==1: ts = sum([(freqs.iloc[i]**0.5 + (freqs.iloc[i]+1)**0.5 - (4*expCounts.iloc[i] + 1)**0.5)**2 for i in range(0,k)]) if modified==2: ts = sum([(freqs.iloc[i]**0.5 + (freqs.iloc[i]+1)**0.5 - (4*(expCounts.iloc[i] + 1))**0.5)**2 for i in range(0,k)]) elif modified==0: ts = 4*sum([(freqs.iloc[i]**0.5 - expCounts.iloc[i]**0.5)**2 for i in range(0,k)]) #set E.S. Pearson correction if cc=="pearson": corFactor = (n - 1)/n testUsed = testUsed + ", and Pearson correction" #Adjust test statistic ts = ts*corFactor #Determine p-value pVal = chi2.sf(ts, df) #Check minimum expected counts #Cells with expected count less than 5 nbelow = len([x for x in expCounts if x < 5]) #Number of cells ncells = len(expCounts) #As proportion pBelow = nbelow/ncells #the minimum expected count minExp = min(expCounts) #prepare results testResults = pd.DataFrame([[n, k, ts, df, pVal, minExp, pBelow*100, testUsed]], columns=["n", "k","statistic", "df", "p-value", "minExp", "percBelow5", "test used"]) pd.set_option('display.max_colwidth', None) return testResults