>>Return to Tell Me About Statistics!

Propensity score-matching methods are analysis methods published by Rosenbaum and Rubin in 1983. When conducting clinical research, two methods exist for investigating and analyzing the relationship between the objective variable and factors related to the objective variable: univariate analysis (one-variable analysis) and multivariate analysis. When there are two variables, this is called a bivariate analysis (correlation analysis).

Univariate analysis is the analysis of a single factor (explanatory variable or covariate) related to an objective variable. Basic statistics (mean/standard deviation), t-test, chi-square test, and log-rank test (Kaplan-Meier curve).

Multivariate analysis involves two or more explanatory or target variables. For example, the target variable may be “survival/death of patients who underwent surgical treatment for a serious disease,” and the explanatory variables may be five factors: “age, sex, BMI, time from onset to surgery, and amount of blood transfusion during surgery.” Thus, some explanatory variables are used in multivariate analysis directly related to the target variable. Moreover, in many cases, the explanatory variables affect each other; these are called “confounding factors,” which often hinder analysis.

One method for effectively eliminating the influence of confounding factors is propensity score matching, which has become a frequently used analytical method in recent years.

The following conditions apply to confounding factors:

  1. Factors that influence objective variables
  2. This is related to the factor (cause) of the problem.
  3. It is not an intermediate variable in the causal chain between the cause and effect in question.

Exploring confounding factors through case studies

We explain with a simple example.
One of the test values for liver function during a health checkup is γ-GTP. γ-GTP indicates the extent to which cells in the liver and bile duct have been destroyed. If the test value exceeds 100 in adult men (the standard is 50), there is a possibility of liver cirrhosis, liver cancer, fatty liver, or biliary tract disease.

Table 1 shows the results of a survey of 100 adult males looking at γ-GTP, alcohol consumption (number of days drinking per month), smoking status, and gambling preferences (7-point scale).

[Table 1] Survey data on γ-GTP, alcohol consumption, smoking, and gambling among 100 adult males

We verify the following using the data in Table 1.

  1. Whether alcohol consumption is related to γ-GTP.
  2. Whether smoking status is related to γ-GTP.
  3. Whether gambling preference is unrelated to γ-GTP.

Significant difference test between two groups of γ-GTP

γ-GTP was created by dividing γ-GTP levels “50 or higher” into group 1 (high group) and “49 or lower” into group 2 (low group).

  1. The average amount of alcohol consumed was 16.2 days for the high group and 11.0 days for the low group, with the high group exceeding the low group by 5.2 points. With a p-value of 0.0037 < 0.05, the high group had significantly higher scores than the low group. This confirmed that the amount of alcohol consumed was related to γ-GTP.
  2. The percentage of participants who answered “yes” to the question of whether they smoked was 45.0% in the high group and 18.3% in the low group, with the high group exceeding the low group by 26.7 points. With a p-value of 0.0037 < 0.05, the high group had significantly higher scores than the low group. Thus, it was possible to verify that smoking status was related to γ-GTP.
  3. The average degree of gambling preference was 4.03 (points) for the high group and 3.17 (points) for the low group, with the high group exceeding the low group by 0.86 points. With a p-value of 0.0252 < 0.05, the high group scored significantly higher than the low group. Although it was assumed that the degree of gambling preference was unrelated to γ-GTP, it was concluded that a relationship exists.

Correlation analysis (Table 2)

  1. The correlation coefficient between alcohol consumption and γ-GTP was 0.6658 (p-value < 0.05)—alcohol consumption is a factor influencing γ-GTP.
  2. The correlation coefficient between smoking and γ-GTP was 0.3076 (p-value < 0.05)—smoking is a factor that influences γ-GTP. (*While smoking status is not a distance scale, we converted it into a distance scale, with “does not smoke” scored as 0 and “smoker” scored as 1, and calculated the correlation coefficient.)
  3. The correlation coefficient between gambling preferences and γ-GTP was 0.3580 (p-value < 0.05)—gambling preferences are a factor influencing γ-GTP.

[Table 2] Correlation analysis of the data in Table 1

[Figure 1] Correlation of variables

The correlation between “γ-GTP and gambling preferences” is 0.36, while that between “γ-GTP and smoking status” is 0.31, with the former being higher. Can we say that gambling preferences have a greater impact on γ-GTP than smoking status?

The correlation between “smoking status and gambling preference” is 0.37, and that between “drinking amount and gambling preference” is 0.30, both of which are correlated. The more people smoked or drank alcohol, the more likely they were to engage in gambling. Thus, γ-GTP is high not because they like gambling but because they smoke and drink extensively.

The correlation of 0.36 between γ-GTP and gambling preference is called an apparent correlation, and the true relationship must be determined by removing the effects of smoking and alcohol consumption. True relationships can be understood through propensity score-matching analysis.

Propensity score matching analysis

The high group included many heavy drinkers and smokers, whereas the low group included many light drinkers and nonsmokers. In this situation, if we compare the high and low groups in terms of average gambling preference, the high group will have a higher average gambling preference because of a correlation between gambling preference, alcohol consumption, and smoking.

[Table 3] Comparison and averages of the high and low groups

When comparing the average level of gambling preference between the high and low groups, a true comparison could be made if both groups had similar alcohol consumption and smoking habits.

In other words, we needed to extract and compare only samples with similar alcohol consumption and smoking habits between the high and low groups. Specifically, we found data with similar elements from both groups and paired them.
*In statistics, finding data with similar elements (confounding factors) in different samples and pairing them is called “matching.”

For the 14 people, we calculated the probability that each sample will be in the high γ-GTP group based on their alcohol intake and smoking status (Table 4).

We found samples with similar (same) probabilities from both groups and paired them. Three pairs were similar in terms of alcohol consumption and smoking status (tendencies).

The probability is called the “propensity score,” and the method of pairing people with similar propensities is called “propensity score matching.”

[Table 4] Results of propensity score matching

The propensity score was calculated by performing logistic regression, with the dependent variable being high group = 1 and low group = 0 and the explanatory variables being the confounding factors of alcohol consumption and smoking status.

The propensity score was calculated by performing logistic regression, with the dependent variable being high group = 1 and low group = 0 and the explanatory variables being the confounding factors of alcohol consumption and smoking status.

The method of finding pairs by propensity score matching of confounding factors and verifying the relationship between the objective and causal variables for the paired data is called “propensity score matching analysis.” (Figure 2)

[Figure 2] Propensity score matching analysis

>>Return to Tell Me About Statistics!

Categories:

Tags:

Comments are closed