Module stikpetP.effect_sizes.eff_size_becker_clogg_r
Expand source code
import math
import pandas as pd
from ..other.table_cross import tab_cross
from statistics import NormalDist
def es_becker_clogg_r(field1, field2, categories1=None, categories2=None, version=1):
'''
Becker and Clogg rho
--------------------
An approximation for the tetrachoric correlation coefficient.
Parameters
----------
field1 : pandas series
data with categories for the rows
field2 : pandas series
data with categories for the columns
categories1 : list or dictionary, optional
the two categories to use from field1. If not set the first two found will be used
categories2 : list or dictionary, optional
the two categories to use from field2. If not set the first two found will be used
version : {1, 2}, optional
version of the rho to determine (see notes)
Returns
-------
Becker and Clogg r
Notes
-----
Version 1 will calculate:
$$\\rho^* = \\frac{g-1}{g+1}$$
Version 2 will calculate:
$$\\rho^{**} = \\frac{OR^{13.3/\\Delta} - 1}{OR^{13.3/\\Delta} + 1}$$
With:
$$g=e^{12.4\\times\\phi - 24.6\\times\\phi^3}$$
$$\\phi = \\frac{\\ln\\left(OR\\right)}{\\Delta}$$
$$OR=\\frac{\\left(\\frac{a}{c}\\right)}{\\left(\\frac{b}{d}\\right)} = \\frac{a\\times d}{b\\times c}$$
$$\\Delta = \\left(\\mu_{R1} - \\mu_{R2}\\right) \\times \\left(v_{C1} - v_{C2}\\right)$$
$$\\mu_{R1} = \\frac{-e^{-\\frac{t_r^2}{2}}}{p_{R1}}, \\mu_{R2} = \\frac{e^{-\\frac{t_r^2}{2}}}{p_{R2}}$$
$$v_{C1} = \\frac{-e^{-\\frac{t_c^2}{2}}}{p_{C1}}, v_{C2} = \\frac{e^{-\\frac{t_c^2}{2}}}{p_{C2}}$$
$$t_r = \\Phi^{-1}\\left(p_{R1}\\right), t_c = \\Phi^{-1}\\left(p_{C1}\\right)$$
$$p_{x} = \\frac{x}{n}$$
*Symbols used:*
* \(a\) the count in the top-left cell of the cross table
* \(b\) the count in the top-right cell of the cross table
* \(c\) the count in the bottom-left cell of the cross table
* \(d\) the count in the bottom-right cell of the cross table
* \(R_i\) the sum of counts in the i-th row
* \(C_i\) the sum of counts in the i-th column
* \\(\\Phi^{-1}\\left(x\\right)\\) for the the inverse standard normal cumulative distribution function
These formulas can be found in Becker and Clogg (1988, pp. 410-412)
References
----------
Becker, M. P., & Clogg, C. C. (1988). A note on approximating correlations from Odds Ratios. *Sociological Methods & Research, 16*(3), 407–424. https://doi.org/10.1177/0049124188016003003
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> es_becker_clogg_r(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"])
0.2082967559691196
>>> es_becker_clogg_r(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"], version=2)
np.float64(0.22342632378882407)
'''
# determine sample cross table
tab = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="exclude")
# cell values of sample cross table
a = tab.iloc[0,0]
b = tab.iloc[0,1]
c = tab.iloc[1,0]
d = tab.iloc[1,1]
#The totals
R1 = a + b
R2 = c + d
C1 = a + c
C2 = b + d
n = R1 + R2
pR1 = R1/n
pR2 = R2/n
pC1 = C1/n
pC2 = C2/n
tr = NormalDist().inv_cdf(pR1)
tc = NormalDist().inv_cdf(pC1)
mR1 = -math.exp(-tr**2/2) / pR1
mR2 = math.exp(-tr**2/2) / pR2
vC1 = -math.exp(-tc**2/2) / pC1
vC2 = math.exp(-tc**2/2) / pC2
delta = (mR1 - mR2)*(vC1 - vC2)
OR = a*d/(b*c)
if (version==2):
rt = (OR**(13.3/delta) - 1) / (OR**(13.3/delta) + 1)
elif (version==1):
phiBC = math.log(OR) / delta
g = math.exp(12.4*phiBC - 24.6*phiBC**3)
rt = (g - 1)/(g + 1)
return(rt)
Functions
def es_becker_clogg_r(field1, field2, categories1=None, categories2=None, version=1)
-
Becker And Clogg Rho
An approximation for the tetrachoric correlation coefficient.
Parameters
field1
:pandas series
- data with categories for the rows
field2
:pandas series
- data with categories for the columns
categories1
:list
ordictionary
, optional- the two categories to use from field1. If not set the first two found will be used
categories2
:list
ordictionary
, optional- the two categories to use from field2. If not set the first two found will be used
version
:{1, 2}
, optional- version of the rho to determine (see notes)
Returns
Becker and Clogg r
Notes
Version 1 will calculate: \rho^* = \frac{g-1}{g+1}
Version 2 will calculate: \rho^{**} = \frac{OR^{13.3/\Delta} - 1}{OR^{13.3/\Delta} + 1}
With: g=e^{12.4\times\phi - 24.6\times\phi^3} \phi = \frac{\ln\left(OR\right)}{\Delta} OR=\frac{\left(\frac{a}{c}\right)}{\left(\frac{b}{d}\right)} = \frac{a\times d}{b\times c} \Delta = \left(\mu_{R1} - \mu_{R2}\right) \times \left(v_{C1} - v_{C2}\right) \mu_{R1} = \frac{-e^{-\frac{t_r^2}{2}}}{p_{R1}}, \mu_{R2} = \frac{e^{-\frac{t_r^2}{2}}}{p_{R2}} v_{C1} = \frac{-e^{-\frac{t_c^2}{2}}}{p_{C1}}, v_{C2} = \frac{e^{-\frac{t_c^2}{2}}}{p_{C2}} t_r = \Phi^{-1}\left(p_{R1}\right), t_c = \Phi^{-1}\left(p_{C1}\right) p_{x} = \frac{x}{n}
Symbols used:
- a the count in the top-left cell of the cross table
- b the count in the top-right cell of the cross table
- c the count in the bottom-left cell of the cross table
- d the count in the bottom-right cell of the cross table
- R_i the sum of counts in the i-th row
- C_i the sum of counts in the i-th column
- \Phi^{-1}\left(x\right) for the the inverse standard normal cumulative distribution function
These formulas can be found in Becker and Clogg (1988, pp. 410-412)
References
Becker, M. P., & Clogg, C. C. (1988). A note on approximating correlations from Odds Ratios. Sociological Methods & Research, 16(3), 407–424. https://doi.org/10.1177/0049124188016003003
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> es_becker_clogg_r(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"]) 0.2082967559691196
>>> es_becker_clogg_r(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"], version=2) np.float64(0.22342632378882407)
Expand source code
def es_becker_clogg_r(field1, field2, categories1=None, categories2=None, version=1): ''' Becker and Clogg rho -------------------- An approximation for the tetrachoric correlation coefficient. Parameters ---------- field1 : pandas series data with categories for the rows field2 : pandas series data with categories for the columns categories1 : list or dictionary, optional the two categories to use from field1. If not set the first two found will be used categories2 : list or dictionary, optional the two categories to use from field2. If not set the first two found will be used version : {1, 2}, optional version of the rho to determine (see notes) Returns ------- Becker and Clogg r Notes ----- Version 1 will calculate: $$\\rho^* = \\frac{g-1}{g+1}$$ Version 2 will calculate: $$\\rho^{**} = \\frac{OR^{13.3/\\Delta} - 1}{OR^{13.3/\\Delta} + 1}$$ With: $$g=e^{12.4\\times\\phi - 24.6\\times\\phi^3}$$ $$\\phi = \\frac{\\ln\\left(OR\\right)}{\\Delta}$$ $$OR=\\frac{\\left(\\frac{a}{c}\\right)}{\\left(\\frac{b}{d}\\right)} = \\frac{a\\times d}{b\\times c}$$ $$\\Delta = \\left(\\mu_{R1} - \\mu_{R2}\\right) \\times \\left(v_{C1} - v_{C2}\\right)$$ $$\\mu_{R1} = \\frac{-e^{-\\frac{t_r^2}{2}}}{p_{R1}}, \\mu_{R2} = \\frac{e^{-\\frac{t_r^2}{2}}}{p_{R2}}$$ $$v_{C1} = \\frac{-e^{-\\frac{t_c^2}{2}}}{p_{C1}}, v_{C2} = \\frac{e^{-\\frac{t_c^2}{2}}}{p_{C2}}$$ $$t_r = \\Phi^{-1}\\left(p_{R1}\\right), t_c = \\Phi^{-1}\\left(p_{C1}\\right)$$ $$p_{x} = \\frac{x}{n}$$ *Symbols used:* * \(a\) the count in the top-left cell of the cross table * \(b\) the count in the top-right cell of the cross table * \(c\) the count in the bottom-left cell of the cross table * \(d\) the count in the bottom-right cell of the cross table * \(R_i\) the sum of counts in the i-th row * \(C_i\) the sum of counts in the i-th column * \\(\\Phi^{-1}\\left(x\\right)\\) for the the inverse standard normal cumulative distribution function These formulas can be found in Becker and Clogg (1988, pp. 410-412) References ---------- Becker, M. P., & Clogg, C. C. (1988). A note on approximating correlations from Odds Ratios. *Sociological Methods & Research, 16*(3), 407–424. https://doi.org/10.1177/0049124188016003003 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> es_becker_clogg_r(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"]) 0.2082967559691196 >>> es_becker_clogg_r(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"], version=2) np.float64(0.22342632378882407) ''' # determine sample cross table tab = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="exclude") # cell values of sample cross table a = tab.iloc[0,0] b = tab.iloc[0,1] c = tab.iloc[1,0] d = tab.iloc[1,1] #The totals R1 = a + b R2 = c + d C1 = a + c C2 = b + d n = R1 + R2 pR1 = R1/n pR2 = R2/n pC1 = C1/n pC2 = C2/n tr = NormalDist().inv_cdf(pR1) tc = NormalDist().inv_cdf(pC1) mR1 = -math.exp(-tr**2/2) / pR1 mR2 = math.exp(-tr**2/2) / pR2 vC1 = -math.exp(-tc**2/2) / pC1 vC2 = math.exp(-tc**2/2) / pC2 delta = (mR1 - mR2)*(vC1 - vC2) OR = a*d/(b*c) if (version==2): rt = (OR**(13.3/delta) - 1) / (OR**(13.3/delta) + 1) elif (version==1): phiBC = math.log(OR) / delta g = math.exp(12.4*phiBC - 24.6*phiBC**3) rt = (g - 1)/(g + 1) return(rt)