Module stikpetP.visualisations.vis_stem_and_leaf
Expand source code
import pandas as pd
from math import floor, log10
def vi_stem_and_leaf(data, key=None):
'''
Stem-and-Leaf Display
------------------------------
A stem-and-leaf display is defined as: "a method of displaying data in which each observation is split into two parts labelled the ‘stem’ and the ‘leaf’" (Everitt, 2004, p. 362). A diagram that could be used to visualize scale variables, created by Tukey (1972, p. 296).
In some variations of this, the cumulative frequencies are also shown, but currently this function does not provide for that.
This function is shown in this [YouTube video](https://youtu.be/J9JnmJTYJyE) and the visualisation is described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/stemAndLeafDisplay.html)
Parameters
----------
data : pandas series or list
the numerical data
key : integer, optional
the factor to use for the stems
Returns
-------
Prints out the display in console, and returns a dictionary with the stems as key, and the leafs as values.
Before, After and Alternatives
------------------------------
Before this you might want to create a binned frequency table with [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins).
After this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation
Or a perform a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.
Alternative Visualisations are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
References
----------
Everitt, B. (2004). *The Cambridge dictionary of statistics* (2nd ed.). Cambridge University Press.
Tukey, J. W. (1972). Some graphic and semigraphic displays. In T. A. Bancroft & S. A. Brown (Eds.), *Statistical Papers in Honor of George W. Snedecor* (pp. 293–316). Iowa State University Press.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
Example 1: pandas series
>>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = student_df['Gen_Age']
>>> vi_stem_and_leaf(ex1);
stem|leaf
---------
0 | 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 21 21 21 21 21 22 22 22 23 23 24 24 24 25 26 26 27 28 28 29 29 30 37
1 | 19
key: 0 | 18 = 18.0
{0: [18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 22, 22, 22, 23, 23, 24, 24, 24, 25, 26, 26, 27, 28, 28, 29, 29, 30, 37], 1: [19]}
Example 2: Numeric list
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> vi_stem_and_leaf(ex2);
stem|leaf
---------
1 | 0 0 0
2 | 0 0 0
3 | 0 0
4 | 0 0 0
5 | 0 0 0 0 0 0 0
key: 1 | 0 = 1
{1: [0, 0, 0], 2: [0, 0, 0], 3: [0, 0], 4: [0, 0, 0], 5: [0, 0, 0, 0, 0, 0, 0]}
'''
if type(data) is list:
data = pd.Series(data)
# remove missing values
data = data.dropna()
if key is None:
key = 10**floor(log10(abs(max(data))))
# sample size
n = len(data)
#sort the results
data_sorted = sorted(data)
#the stem for each score
stems = [int(i/key) for i in data_sorted]
#the leaf for each score (formatted and unformatted)
format_str = '{:0' + str(len(str(key)) - 1) + 'd}'
unformat_leafs = [int(data_sorted[i] - key*stems[i]) for i in range(n)]
leafs = [format_str.format(int(data_sorted[i] - key*stems[i])) for i in range(n)]
display = []
results = {}
stem_current_value = stems[0]
current_leaf = leafs[0]
current_unformat_leaf = [unformat_leafs[0]]
for i in range(1, n):
if stems[i] == stems[i-1]:
current_leaf = str(current_leaf) + " " + str(leafs[i])
current_unformat_leaf.append(unformat_leafs[i])
else:
display.append([stems[i-1], current_leaf])
results[stems[i-1]] = current_unformat_leaf
current_leaf = leafs[i]
current_unformat_leaf = [unformat_leafs[i]]
stem_current_value = stem_current_value + 1
while stem_current_value < stems[i]:
display.append([stem_current_value, ""])
stem_current_value = stem_current_value + 1
display.append([stems[n-1], current_leaf])
results[stems[n-1]] = current_unformat_leaf
#showing the results
print("stem|leaf")
print("---------")
for i in display:
print(str(i[0]) + " | " + i[1])
print("key: " + str(stems[0]) + " | " + leafs[0] + " = " + str(data_sorted[0]))
return results
Functions
def vi_stem_and_leaf(data, key=None)
-
Stem-and-Leaf Display
A stem-and-leaf display is defined as: "a method of displaying data in which each observation is split into two parts labelled the ‘stem’ and the ‘leaf’" (Everitt, 2004, p. 362). A diagram that could be used to visualize scale variables, created by Tukey (1972, p. 296).
In some variations of this, the cumulative frequencies are also shown, but currently this function does not provide for that.
This function is shown in this YouTube video and the visualisation is described at PeterStatistics.com
Parameters
data
:pandas series
orlist
- the numerical data
key
:integer
, optional- the factor to use for the stems
Returns
Prints out the display in console, and returns a dictionary with the stems as key, and the leafs as values.
Before, After and Alternatives
Before this you might want to create a binned frequency table with tab_frequency_bins.
After this you might want some descriptive measures. Use me_mode_bin for Mode for Binned Data, me_mean for different types of mean, and/or me_variation for different Measures of Quantitative Variation
Or a perform a test. Various options include ts_student_t_os for One-Sample Student t-Test, ts_trimmed_mean_os for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or ts_z_os for One-Sample Z Test.
Alternative Visualisations are vi_boxplot_single for a Box (and Whisker) Plot and vi_histogram for a Histogram
References
Everitt, B. (2004). The Cambridge dictionary of statistics (2nd ed.). Cambridge University Press.
Tukey, J. W. (1972). Some graphic and semigraphic displays. In T. A. Bancroft & S. A. Brown (Eds.), Statistical Papers in Honor of George W. Snedecor (pp. 293–316). Iowa State University Press.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Example 1: pandas series
>>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = student_df['Gen_Age'] >>> vi_stem_and_leaf(ex1); stem|leaf --------- 0 | 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 21 21 21 21 21 22 22 22 23 23 24 24 24 25 26 26 27 28 28 29 29 30 37 1 | 19 key: 0 | 18 = 18.0 {0: [18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 22, 22, 22, 23, 23, 24, 24, 24, 25, 26, 26, 27, 28, 28, 29, 29, 30, 37], 1: [19]}
Example 2: Numeric list
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> vi_stem_and_leaf(ex2); stem|leaf --------- 1 | 0 0 0 2 | 0 0 0 3 | 0 0 4 | 0 0 0 5 | 0 0 0 0 0 0 0 key: 1 | 0 = 1 {1: [0, 0, 0], 2: [0, 0, 0], 3: [0, 0], 4: [0, 0, 0], 5: [0, 0, 0, 0, 0, 0, 0]}
Expand source code
def vi_stem_and_leaf(data, key=None): ''' Stem-and-Leaf Display ------------------------------ A stem-and-leaf display is defined as: "a method of displaying data in which each observation is split into two parts labelled the ‘stem’ and the ‘leaf’" (Everitt, 2004, p. 362). A diagram that could be used to visualize scale variables, created by Tukey (1972, p. 296). In some variations of this, the cumulative frequencies are also shown, but currently this function does not provide for that. This function is shown in this [YouTube video](https://youtu.be/J9JnmJTYJyE) and the visualisation is described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/stemAndLeafDisplay.html) Parameters ---------- data : pandas series or list the numerical data key : integer, optional the factor to use for the stems Returns ------- Prints out the display in console, and returns a dictionary with the stems as key, and the leafs as values. Before, After and Alternatives ------------------------------ Before this you might want to create a binned frequency table with [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins). After this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation Or a perform a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test. Alternative Visualisations are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram References ---------- Everitt, B. (2004). *The Cambridge dictionary of statistics* (2nd ed.). Cambridge University Press. Tukey, J. W. (1972). Some graphic and semigraphic displays. In T. A. Bancroft & S. A. Brown (Eds.), *Statistical Papers in Honor of George W. Snedecor* (pp. 293–316). Iowa State University Press. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- Example 1: pandas series >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = student_df['Gen_Age'] >>> vi_stem_and_leaf(ex1); stem|leaf --------- 0 | 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 21 21 21 21 21 22 22 22 23 23 24 24 24 25 26 26 27 28 28 29 29 30 37 1 | 19 key: 0 | 18 = 18.0 {0: [18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 22, 22, 22, 23, 23, 24, 24, 24, 25, 26, 26, 27, 28, 28, 29, 29, 30, 37], 1: [19]} Example 2: Numeric list >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> vi_stem_and_leaf(ex2); stem|leaf --------- 1 | 0 0 0 2 | 0 0 0 3 | 0 0 4 | 0 0 0 5 | 0 0 0 0 0 0 0 key: 1 | 0 = 1 {1: [0, 0, 0], 2: [0, 0, 0], 3: [0, 0], 4: [0, 0, 0], 5: [0, 0, 0, 0, 0, 0, 0]} ''' if type(data) is list: data = pd.Series(data) # remove missing values data = data.dropna() if key is None: key = 10**floor(log10(abs(max(data)))) # sample size n = len(data) #sort the results data_sorted = sorted(data) #the stem for each score stems = [int(i/key) for i in data_sorted] #the leaf for each score (formatted and unformatted) format_str = '{:0' + str(len(str(key)) - 1) + 'd}' unformat_leafs = [int(data_sorted[i] - key*stems[i]) for i in range(n)] leafs = [format_str.format(int(data_sorted[i] - key*stems[i])) for i in range(n)] display = [] results = {} stem_current_value = stems[0] current_leaf = leafs[0] current_unformat_leaf = [unformat_leafs[0]] for i in range(1, n): if stems[i] == stems[i-1]: current_leaf = str(current_leaf) + " " + str(leafs[i]) current_unformat_leaf.append(unformat_leafs[i]) else: display.append([stems[i-1], current_leaf]) results[stems[i-1]] = current_unformat_leaf current_leaf = leafs[i] current_unformat_leaf = [unformat_leafs[i]] stem_current_value = stem_current_value + 1 while stem_current_value < stems[i]: display.append([stem_current_value, ""]) stem_current_value = stem_current_value + 1 display.append([stems[n-1], current_leaf]) results[stems[n-1]] = current_unformat_leaf #showing the results print("stem|leaf") print("---------") for i in display: print(str(i[0]) + " | " + i[1]) print("key: " + str(stems[0]) + " | " + leafs[0] + " = " + str(data_sorted[0])) return results