Module stikpetP.visualisations.vis_stem_and_leaf

Expand source code
import pandas as pd
from math import floor, log10

def vi_stem_and_leaf(data, key=None):
    '''
    Stem-and-Leaf Display
    ------------------------------
    
    A stem-and-leaf display is defined as: "a method of displaying data in which each observation is split into two parts labelled the ‘stem’ and the ‘leaf’" (Everitt, 2004, p. 362). A diagram that could be used to visualize scale variables, created by Tukey (1972, p. 296).

    In some variations of this, the cumulative frequencies are also shown, but currently this function does not provide for that.

    This function is shown in this [YouTube video](https://youtu.be/J9JnmJTYJyE) and the visualisation is described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/stemAndLeafDisplay.html)
    
    Parameters
    ----------
    data : pandas series or list
        the numerical data
    key : integer, optional
        the factor to use for the stems
        
    Returns
    -------
    Prints out the display in console, and returns a dictionary with the stems as key, and the leafs as values.

    Before, After and Alternatives
    ------------------------------
    Before this you might want to create a binned frequency table with [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins).

    After this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation
    
    Or a perform a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.

    Alternative Visualisations are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
    
    References
    ----------
    Everitt, B. (2004). *The Cambridge dictionary of statistics* (2nd ed.). Cambridge University Press.
    
    Tukey, J. W. (1972). Some graphic and semigraphic displays. In T. A. Bancroft & S. A. Brown (Eds.), *Statistical Papers in Honor of George W. Snedecor* (pp. 293–316). Iowa State University Press.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    Example 1: pandas series
    >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = student_df['Gen_Age']
    >>> vi_stem_and_leaf(ex1);
    stem|leaf
    ---------
    0 | 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 21 21 21 21 21 22 22 22 23 23 24 24 24 25 26 26 27 28 28 29 29 30 37
    1 | 19
    key: 0 | 18 = 18.0
    {0: [18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 22, 22, 22, 23, 23, 24, 24, 24, 25, 26, 26, 27, 28, 28, 29, 29, 30, 37], 1: [19]}
    
    Example 2: Numeric list
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> vi_stem_and_leaf(ex2);
    stem|leaf
    ---------
    1 | 0 0 0
    2 | 0 0 0
    3 | 0 0
    4 | 0 0 0
    5 | 0 0 0 0 0 0 0
    key: 1 | 0 = 1
    {1: [0, 0, 0], 2: [0, 0, 0], 3: [0, 0], 4: [0, 0, 0], 5: [0, 0, 0, 0, 0, 0, 0]}
    
    '''
    if type(data) is list:
            data = pd.Series(data)

    # remove missing values
    data = data.dropna()

    if key is None:
        key = 10**floor(log10(abs(max(data))))
    
    # sample size
    n = len(data)

    #sort the results
    data_sorted = sorted(data)

    #the stem for each score
    stems = [int(i/key) for i in data_sorted]

    #the leaf for each score (formatted and unformatted)
    format_str = '{:0' + str(len(str(key)) - 1) + 'd}'
    unformat_leafs = [int(data_sorted[i] - key*stems[i]) for i in range(n)]
    leafs = [format_str.format(int(data_sorted[i] - key*stems[i])) for i in range(n)]
    
    display = []
    results = {}
    stem_current_value = stems[0]
    current_leaf = leafs[0]
    current_unformat_leaf = [unformat_leafs[0]]
    
    for i in range(1, n):
        if stems[i] == stems[i-1]:
            current_leaf = str(current_leaf) + " " + str(leafs[i])
            current_unformat_leaf.append(unformat_leafs[i])
        
        else:
            display.append([stems[i-1], current_leaf])
            results[stems[i-1]] = current_unformat_leaf
            
            current_leaf = leafs[i]
            current_unformat_leaf = [unformat_leafs[i]]
            stem_current_value = stem_current_value + 1
    
            while stem_current_value < stems[i]:
                display.append([stem_current_value, ""])
                stem_current_value = stem_current_value + 1
    
    display.append([stems[n-1], current_leaf])
    results[stems[n-1]] = current_unformat_leaf
    
    #showing the results
    print("stem|leaf")
    print("---------")
    for i in display:
        print(str(i[0]) + " | " + i[1])
    print("key: " + str(stems[0]) + " | " + leafs[0] + " = " + str(data_sorted[0]))

    return results

Functions

def vi_stem_and_leaf(data, key=None)

Stem-and-Leaf Display

A stem-and-leaf display is defined as: "a method of displaying data in which each observation is split into two parts labelled the ‘stem’ and the ‘leaf’" (Everitt, 2004, p. 362). A diagram that could be used to visualize scale variables, created by Tukey (1972, p. 296).

In some variations of this, the cumulative frequencies are also shown, but currently this function does not provide for that.

This function is shown in this YouTube video and the visualisation is described at PeterStatistics.com

Parameters

data : pandas series or list
the numerical data
key : integer, optional
the factor to use for the stems

Returns

Prints out the display in console, and returns a dictionary with the stems as key, and the leafs as values.

Before, After and Alternatives

Before this you might want to create a binned frequency table with tab_frequency_bins.

After this you might want some descriptive measures. Use me_mode_bin for Mode for Binned Data, me_mean for different types of mean, and/or me_variation for different Measures of Quantitative Variation

Or a perform a test. Various options include ts_student_t_os for One-Sample Student t-Test, ts_trimmed_mean_os for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or ts_z_os for One-Sample Z Test.

Alternative Visualisations are vi_boxplot_single for a Box (and Whisker) Plot and vi_histogram for a Histogram

References

Everitt, B. (2004). The Cambridge dictionary of statistics (2nd ed.). Cambridge University Press.

Tukey, J. W. (1972). Some graphic and semigraphic displays. In T. A. Bancroft & S. A. Brown (Eds.), Statistical Papers in Honor of George W. Snedecor (pp. 293–316). Iowa State University Press.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: pandas series

>>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = student_df['Gen_Age']
>>> vi_stem_and_leaf(ex1);
stem|leaf
---------
0 | 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 21 21 21 21 21 22 22 22 23 23 24 24 24 25 26 26 27 28 28 29 29 30 37
1 | 19
key: 0 | 18 = 18.0
{0: [18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 22, 22, 22, 23, 23, 24, 24, 24, 25, 26, 26, 27, 28, 28, 29, 29, 30, 37], 1: [19]}

Example 2: Numeric list

>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> vi_stem_and_leaf(ex2);
stem|leaf
---------
1 | 0 0 0
2 | 0 0 0
3 | 0 0
4 | 0 0 0
5 | 0 0 0 0 0 0 0
key: 1 | 0 = 1
{1: [0, 0, 0], 2: [0, 0, 0], 3: [0, 0], 4: [0, 0, 0], 5: [0, 0, 0, 0, 0, 0, 0]}
Expand source code
def vi_stem_and_leaf(data, key=None):
    '''
    Stem-and-Leaf Display
    ------------------------------
    
    A stem-and-leaf display is defined as: "a method of displaying data in which each observation is split into two parts labelled the ‘stem’ and the ‘leaf’" (Everitt, 2004, p. 362). A diagram that could be used to visualize scale variables, created by Tukey (1972, p. 296).

    In some variations of this, the cumulative frequencies are also shown, but currently this function does not provide for that.

    This function is shown in this [YouTube video](https://youtu.be/J9JnmJTYJyE) and the visualisation is described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/stemAndLeafDisplay.html)
    
    Parameters
    ----------
    data : pandas series or list
        the numerical data
    key : integer, optional
        the factor to use for the stems
        
    Returns
    -------
    Prints out the display in console, and returns a dictionary with the stems as key, and the leafs as values.

    Before, After and Alternatives
    ------------------------------
    Before this you might want to create a binned frequency table with [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins).

    After this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation
    
    Or a perform a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.

    Alternative Visualisations are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
    
    References
    ----------
    Everitt, B. (2004). *The Cambridge dictionary of statistics* (2nd ed.). Cambridge University Press.
    
    Tukey, J. W. (1972). Some graphic and semigraphic displays. In T. A. Bancroft & S. A. Brown (Eds.), *Statistical Papers in Honor of George W. Snedecor* (pp. 293–316). Iowa State University Press.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    Example 1: pandas series
    >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = student_df['Gen_Age']
    >>> vi_stem_and_leaf(ex1);
    stem|leaf
    ---------
    0 | 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 21 21 21 21 21 22 22 22 23 23 24 24 24 25 26 26 27 28 28 29 29 30 37
    1 | 19
    key: 0 | 18 = 18.0
    {0: [18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 22, 22, 22, 23, 23, 24, 24, 24, 25, 26, 26, 27, 28, 28, 29, 29, 30, 37], 1: [19]}
    
    Example 2: Numeric list
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> vi_stem_and_leaf(ex2);
    stem|leaf
    ---------
    1 | 0 0 0
    2 | 0 0 0
    3 | 0 0
    4 | 0 0 0
    5 | 0 0 0 0 0 0 0
    key: 1 | 0 = 1
    {1: [0, 0, 0], 2: [0, 0, 0], 3: [0, 0], 4: [0, 0, 0], 5: [0, 0, 0, 0, 0, 0, 0]}
    
    '''
    if type(data) is list:
            data = pd.Series(data)

    # remove missing values
    data = data.dropna()

    if key is None:
        key = 10**floor(log10(abs(max(data))))
    
    # sample size
    n = len(data)

    #sort the results
    data_sorted = sorted(data)

    #the stem for each score
    stems = [int(i/key) for i in data_sorted]

    #the leaf for each score (formatted and unformatted)
    format_str = '{:0' + str(len(str(key)) - 1) + 'd}'
    unformat_leafs = [int(data_sorted[i] - key*stems[i]) for i in range(n)]
    leafs = [format_str.format(int(data_sorted[i] - key*stems[i])) for i in range(n)]
    
    display = []
    results = {}
    stem_current_value = stems[0]
    current_leaf = leafs[0]
    current_unformat_leaf = [unformat_leafs[0]]
    
    for i in range(1, n):
        if stems[i] == stems[i-1]:
            current_leaf = str(current_leaf) + " " + str(leafs[i])
            current_unformat_leaf.append(unformat_leafs[i])
        
        else:
            display.append([stems[i-1], current_leaf])
            results[stems[i-1]] = current_unformat_leaf
            
            current_leaf = leafs[i]
            current_unformat_leaf = [unformat_leafs[i]]
            stem_current_value = stem_current_value + 1
    
            while stem_current_value < stems[i]:
                display.append([stem_current_value, ""])
                stem_current_value = stem_current_value + 1
    
    display.append([stems[n-1], current_leaf])
    results[stems[n-1]] = current_unformat_leaf
    
    #showing the results
    print("stem|leaf")
    print("---------")
    for i in display:
        print(str(i[0]) + " | " + i[1])
    print("key: " + str(stems[0]) + " | " + leafs[0] + " = " + str(data_sorted[0]))

    return results