Regression and Regression-like Models for Potential Modeling and the Role of Conditional Independence.

Helmut Schaeben. ( 2014 )
in: Proc. 34th Gocad Meeting, Nancy

Abstract

Weights-of-evidence used to be the most popular method of potential modeling. Its fundamental modeling assumption is conditional independence of all predictor variables given the target variable. Given this assumption, weights-of-evidence is ordinary logistic regression with parameters equal to the differences of the weights of evidence. The hypothesis of conditional independence can be tested in terms of log-linear models. If the assumption of conditional independence is violated, application of weights-of-evidence not only corrupts the predicted conditional probabilities but also their rank transform. An approach to account for a lack of conditional independence is to include corresponding interaction terms in a logistic regression model. Under mild additional assumptions proper interaction terms compensate exactly for violations of conditional independence, i.e. the extended logistic regression model agrees with the true conditional probability. Replacing the link function of logistic regression, the logit transform, by the isometric log–ratio transform ilr of compositional statistics leads to compositional regression models. If required, they may include interaction terms, and yield very similar results as logistic regression. Multilayer artificial neural nets may be seen as nested regression-like models, with some sigmoidal activation function. Most often, the logistic function, inverse of the logit transform, is used as activation function. If the net topology, i.e. its control, is sufficiently versatile to allow for interaction terms, artificial neural nets are able to account for violations of conditional independence and yield very similar results. Thus, any method capable to consider interaction terms can compensate a lack of conditional independence to some extent. In particular, logistic regression including interaction terms is the canonical generalization of weights-of-evidence. Subsequent modifications of the weights of evidence as often suggested cannot counterbalance any violation of conditional independence.

Download / Links

BibTeX Reference

@inproceedings{SchaebenGM2014,
 abstract = { Weights-of-evidence used to be the most popular method of potential modeling. Its fundamental modeling assumption is conditional independence of all predictor variables given the target variable. Given this assumption, weights-of-evidence is ordinary logistic regression with parameters equal to the differences of the weights of evidence. The hypothesis of conditional independence can be tested in terms of log-linear models. If the assumption of conditional independence is violated, application of weights-of-evidence not only corrupts the predicted conditional probabilities but also their rank transform. An approach to account for a lack of conditional independence is to include corresponding interaction terms in a logistic regression model. Under mild additional assumptions proper interaction terms compensate exactly for violations of conditional independence, i.e. the extended logistic regression model agrees with the true conditional probability. Replacing the link function of logistic regression, the logit transform, by the isometric log–ratio transform ilr of compositional statistics leads to compositional regression models. If required, they may include interaction terms, and yield very similar results as logistic regression. Multilayer artificial neural nets may be seen as nested regression-like models, with some sigmoidal activation function. Most often, the logistic function, inverse of the logit transform, is used as activation function. If the net topology, i.e. its control, is sufficiently versatile to allow for interaction terms, artificial neural nets are able to account for violations of conditional independence and yield very similar results. Thus, any method capable to consider interaction terms can compensate a lack of conditional independence to some extent. In particular, logistic regression including interaction terms is the canonical generalization of weights-of-evidence. Subsequent modifications of the weights of evidence as often suggested cannot counterbalance any violation of conditional independence. },
 author = { Schaeben, Helmut },
 booktitle = { Proc. 34th Gocad Meeting, Nancy },
 title = { Regression and Regression-like Models for Potential Modeling and the Role of Conditional Independence. },
 year = { 2014 }
}