2.3. Scale variable (Frequency Density)

(this site uses frames, if you do not see the weblecture and definitions frames on the right you can click here, if you don't see a menu on the left and want to go to the home page click here)

All the frequency types discussed for a nominal and ordinal variable, can also be applied for a scale variable. One complication however is that for a scale variable, there are often so many options that the table becomes very long. Since the point of a table is to give a clear overview, and a long table often isn’t very clear, this creates a problem. The solution often used is to create bins (or classes) as shown in Table 17.

Table 17
*Example 1 of frequency table with bins*
Age	Frequency
0 < 10	15
10 < 15	23
15 < 25	22
25 < 50	40
50 < 100	4

The table shows that there were 15 respondents in the age bin of 0 < 10. The symbol '<' is used for 'but under', so someone of the age of 10 would fit into 10 < 15, but not in 0 < 10. Sometimes ≤ is used, which stands for 'equal or less than'. A more technical method is the use of [ or ] to indicate ‘including’ and ( or ) to indicate smaller than. The interval 0 < 10 is then the same as [0,10), and the interval 0 ≤ 9 is the same as [0,9]. Another symbol often used is a hyphen (-). It is however sometimes used as < is (Chaudhary, Kumar, & Alka, 2009; Sharma, 2007), and sometimes as ≤ is (Beri, 2010; Haighton, Haworth, & Wake, 2003).

The lower end of a bin is called the lower bound and the upper end the upper bound (e.g. the bin 15 < 25 has as a lower bound 15 and as an upper bound 25).

When creating these bins two important rules should be met:

bins should not overlap. So do not use 0 < 10 and 5 < 15, since a person who is then 8 years would fit into both. This sometimes goes wrong when people use ≤ instead of <.
Each score should fit into a bin. This means that the lower bound of the first bin should be smaller than the lowest score, and the upper bound of the last bin should be higher than the highest score.

These two rules can be combined into one: each score should fit into exactly one bin. There are also various formulas to help on deciding how many bins you should use, or how wide each bin should be. This is important because depending on how the bins are setup the results might look different. As a rule of thumb for descriptive statistics I’d suggest using between 4 and 10 bins. Anything more than 10 might cause the table to become unclear (which is exactly what we are trying to avoid) and with anything less than 4 we might lose too much information. By creating bins we lose some information since we don’t see exactly anymore what for example the ages were of the 15 people in the 0 < 10 bin. In the optional chapter 2.5.2 more information on the number of bins can be found.

Returning to the example in Table 17. Imagine this was generated from some data on which people are interested in a certain product (e.g. there are 15 people in the age category 0 < 10 who like the product). At first sight it might appear as if the company would do best to focus on the middle aged people (25 < 50) since it is the category with the most people in it that liked the product. However another research generated the data shown in Table 18.

Table 18
*Example 2 of frequency table with bins*
Age	Frequency
0 < 10	15
10 < 25	45
25 < 30	29
30 < 50	11
50 < 100	5

This table would suggest at first sight to focus on the younger people (age 10 < 25). If you look careful however both tables are actually the same. In Table 17 the category 25 < 50 with a frequency of 40, is the combination of the bins 25 < 30 and 30 < 50 from Table 18, with frequencies 29 and 11, which is indeed together again 40. Also the other way around, the bin 10 < 25 from Table 18 has the same frequency as the combined bins of 10 < 15 and 15 < 25 from Table 17.

It is difficult to see which bin is most crowded because the bin sizes are not the same. To resolve this we’ll use a technique that is also used in geography. You might be familiar with the term population density, which is how crowded a region is. It is calculated by dividing the population size by the size of the region. With our frequency table we can do the same. Instead of a population we have frequencies, and instead of a region we have bin sizes. The so-called frequency density is therefore the frequency divided by the bin size.

The bin size is can be calculated by taking the upper bound, minus the lower bound. The calculation is shown in Table 19.

Table 19
*Example calculation of frequency density*
Age	Frequency	bin size	Frequency Density
0 < 10	15	10 – 0 = 10	15 / 10 = 1.5
10 < 15	23	15 – 10 = 5	23 / 5 = 4.6
15 < 25	22	25 – 15 = 10	22 / 10 = 2.2
25 < 50	40	50 – 25 = 25	40 / 25 = 1.6
50 < 100	4	100 – 50 = 50	5 / 50 = 0.1

We can now see that the most crowded bin is the 10 < 15. The 4.6 means that if we would split up the age category 10 < 15 into bins with a size of 1 (i.e. 10 < 11, 11 < 12, 12 < 13, 13 < 14 and 14 < 15), each of these would have a frequency of 4.6. Most often the frequency density is defined by how it is calculated. For example: “the number of occurrences of an event divided by the bin size…” (Zedeck, 2014, pp. 144–145). I prefer to use a conceptual definition and would therefor define it as shown below.

definition 26: Frequency Density
Frequency Density the frequency in each new bin that can be expected if a bin is split into bins of size one.

To reverse the process, you can multiply a frequency density by the bin width to obtain the absolute frequency again. For example, the absolute frequency of the age bin 15 < 25 is indeed the frequency density of 2.2, multiplied by the bin size of 10: 2.2 x 10 = 22.

The last type of frequency to be discussed is the relative frequency density. We will see this one again when we get to inferential statistics, while the use of it in descriptive statistics is very limited.

definition 27: Relative Frequency Density
Relative Frequency Density the relative frequency in each new bin that can be expected if a bin is split into bins of size one.

The relative frequency density can be calculated by dividing frequency density by total frequency (Haighton, Haworth, & Wake, 2003, p. 74), as shown in Table 20.

Table 20
*Example calculation of relative frequency density, method 1*
Age	Frequency	bin size	Frequency Density	Relative Freq. Density
0 < 10	15	10	1.5	1.5 / 105 ≈ 0.014
10 < 15	23	5	4.6	4.6 / 105 ≈ 0.044
15 < 25	22	10	2.2	2.2 / 105 ≈ 0.021
25 < 50	40	25	1.6	1.6 / 105 ≈ 0.015
50 < 100	4	50	0.1	0.1 / 105 ≈ 0.001
Total	105

However according to (Kozak, Kozak, Staudhammer, & Watts, 2008, p. 80), it should be done by dividing the relative frequency by the bin size, as shown in Table 21.

Table 21
*Example calculation of relative frequency density, method 2*
Age	Frequency	bin size	Relative Frequency	Relative Freq. Density
0 < 10	15	10	0.14	0.14 / 10 = 0.014
10 < 15	23	5	0.22	0.22 / 5 = 0.044
15 < 25	22	10	0.21	0.21 / 10 = 0.021
25 < 50	40	25	0.38	0.38 / 25 ≈ 0.015
50 < 100	4	50	0.05	0.05 / 50 = 0.001
Total	105

Note that both results are the same. To reverse the process you can multiply the Relative Frequency Density by the bin size to obtain the relative frequency.

A few final remarks on frequency densities:

If the bin size is the same for each category, there is not really a point to calculate frequency density.
Cumulative frequency densities are not often used and even argued to be pointless to calculate (Petry & Friesen, 2012).
An alternative method using a standard bin size is sometimes used (see the optional chapter 2.6.4 for details on this).
If you have open ended bins (e.g. ‘below 20’, ‘65+’) you cannot determine the bin size, and therefore also not the frequency density.
The recoding of a scale variable into categories makes this variable (Age category) an ordinal variable. Vice versa if you have an ordinal variable that is actually a scale variable already made into bins, you can also apply frequency densities.

<<next segment: 2.4. Table for two variables>>