>>Return to Tell Me About Statistics!

Here, we explain how to calculate the correlation ratio, which indicates the correlation between categorical and quantitative data.

Looking at the age range for each supplement (A, B, C) in the example described in Vol. 52, there are differences in age range: A is 29 to 36 years old, B is 38 to 48 years old, and C is 20 to 38 years old. If you graph this, the difference in age ranges becomes clearer. (Figure 1)

[Figure 1] Graph of age data by supplement

When there is a difference in the age range, we believe that supplements and age are related. Figures 2 and 3 illustrate how to determine the most relevant age range.

[Figure 2] Strongly Related

When the age variation within a group is small and the age ranges do not overlap, the relationship between supplements and age is considered strong.

[Figure 3] Weak association

When the age variation within a group is large and the age ranges overlap, the relationship between supplements and age is considered weak.

What is ‘within-group variation’ and ‘between the group change’?

Variation within a group is called “within-group variation.”

Let us calculate the within-group variation for the supplement-age-specific data in the table. The variation is calculated as the sum of the squared deviations.

[Table 1] Supplement-age-specific data and sum of squared deviation

The sum of the three sums of squared deviations is called “within-group variation” and is expressed as “Sw.”

Sw =S1+S2+S3=30+58+266=354

The fact that the age ranges do not overlap indicates significant variation in the three age range groups. Conversely, overlapping age ranges indicate less variation among the three groups (Figure 4).

[Figure 4] Size of group fluctuations

The age variation range, that is, the variation between groups, was calculated from the difference between the mean of each group(Ū₁, Ū₂, Ū₃) and the overall mean (Ū). This is called “between the groups change” and is expressed as “Sb” (Figure 5).

[Figure 5] Image of Intergroup variation

Let the number of respondents in the three groups be n1, n2, and n3. In this case, ‘between the group change’ was calculated by multiplying the square of the difference between the individual and overall mean values by the number of people in each group, as shown below:

[Formula]

Sb=n1(Ū₁-Ū)2+ n2(Ū₂-Ū)2 +n3(Ū₃- Ū)2

= 4×(33-34)2+ 5×(42-34)2 +6×(28- 34)2

=540

How to calculate correlation ratio

If the age variation within groups is small and the age ranges do not overlap, it can be said that “there is a relationship when the within-group variation is small and the between the group change is large.” Therefore, we calculated the ratio of the between the group change to the two total variations. This is called the “correlation ratio” and is expressed as η2 (eta).

[Formula]

η2= Sb ÷ ( Sw+Sb )

By substituting the supplement-age-specific data into this formula, the correlation ratio can be calculated as follows:

Sw+Sb =354+540=894

η2=540 ÷ 894=0.604

Looking at the correlation ratio formula, when the relationship is strongest, the within-group variation “Sw” is 0, which means all data belonging to the group are the same, and η2 is 1. Conversely, when the association is weakest, the between the groups change, “Sb,” is 0, that is, the group means are all the same, and η2 is 0.

>>Return to Tell Me About Statistics!

Categories:

Tags:

Comments are closed