>>Return to Tell Me About Statistics!
In medical statistics, one of the statistical methods used for multivariate analysis is “logistic regression analysis.”
Logistic regression analysis predicts the probability that a binary outcome (objective variable) will occur from several factors (explanatory variables). The objective variable is categorical data of two groups, and the explanatory variable is quantitative data, which is similar to discriminant analysis.
Logistic regression analysis uses a regression formula to clarify the following two points.
- Calculating the probability (predicted value) that the target variable will occur for each sample.
- Degree of influence of explanatory variables used in the regression formula on the objective variable.
[Formula for logistic regression analysis]
e: Base of natural logarithm, value 2.71828 …
a: Regression coefficient
y: Objective variable
x: Explanatory variable
Let us determine the probability of Mr. B having cancer using the data in Table 1.
[Table 1]
The regression coefficient for the number of drinking days is 0.1032, and that for the number of cigarettes smoked is 0.1789.
Mr. B drinks 15 days per month and smokes 20 cigarettes per day. Calculate by substituting that value for x in the regression formula.
[Formula]
Per the result, the probability of Mr. B having cancer was 68%. If the probability is 50% or higher, it is determined that it is cancer.
The Akaike information criterion (AIC), c statistic (AUC), discriminant accuracy rate, coefficient of determination, Hosmer-Lemeshow’s goodness of fit, etc., are used as indicators to express the goodness of fit of the regression equation model obtained by logistic regression analysis.
Let us take a look at the discrimination accuracy rate for this case in Table 2.
[Table 2]
As 9 of 10 people (90%) are correct, and the discrimination accuracy rate exceeds the standard of 75%, we judge that it can be used for prediction. In addition, we consider that a correlation ratio of 0.5 or more between the probability calculated using the previous regression formula and the actual value can be used for prediction.
>>Return to Tell Me About Statistics!
Comments are closed