>>Return to Tell Me About Statistics!

By conducting a test on the effect of a certain drug, if the degree of dispersion of the obtained data can be demonstrated numerically, the drug can be evaluated by comparing the values. The numerical value that indicates the degree of dispersion is called “standard deviation (SD).”

The standard deviation is a numerical value obtained by asking, “how much variation is there?” and, “is the difference large compared with the average value?”

This standard deviation is often used for quantitative data, but can also be calculated for data that cannot be measured numerically (category data).In this study, we will explain the standard deviation of categorical data.

Formula for standard deviation for categorical data

Data can be roughly divided into quantitative and categorical data.

・Quantitative data: Data that can be added or subtracted as numerical values (e.g., blood pressure, body temperature)
・Categorical data: Data that cannot be measured numerically (e.g., smoking status, gender)

For example, binary categorical data for “with smoking and without smoking,” such as in medical questionnaires, can be converted to data in the form of “1, 0” to obtain the standard deviation, using the formula below.

Formula for “1, 0” data

When the proportion of “1” in the “1, 0” data is set to P,

Variance = P(1-P), Standard deviation = √P(1-P)

Table 1 shows a questionnaire survey of “smokers and non-smokers.” Let us find the variance and standard deviation when these data are converted to the “1, 0” form with “smoking”:1 and “no smoking”:0.

[Table 1] Data on smoking/non-smoking in the questionnaire

The proportion of people who smoke is 3 in 5,

3/5 = 0.6

Applying the above formula,

Variance = 0.6 (1 – 0.6) = 0.24

standard deviation = √0.24 = 0.49

When calculated without using this formula, the data had a mean of 0.6 and variance of 0.24 (Table 2). The standard deviation was 0.49, which is the same as that obtained using the formula.

[Table 2 ] Calculation of standard deviation of prevalence of smoking

Notes on categorical data

As in the above problem, the criteria for categorical data evaluated by “smoker/non-smoker” or “male/female,” are called “nominal scales.” There is no relationship between large and small, regardless of whether they are the same when compared to each other.

There is another criterion for evaluating categorical data, which is called “the ordinal scale.” For example, we asked patients about their level of satisfaction with treatment, and the data were evaluated on a five-point scale: “satisfied, somewhat satisfied, neutral, somewhat dissatisfied, and dissatisfied.” When comparing such ordinal scales, in addition to whether they are the same or not, the data also have a relationship of magnitude.

>>Return to Tell Me About Statistics!

Categories:

Tags:

Comments are closed