|
These are difficult mathematical questions. They are arising from
real applications such as fraud
detection, arbitrage and scoring systems. If you have interesting answers
to any
questions, feel free to email us your comments or solution. The best answers
will be published here. Companies and Organizations interested in submitting problems should
E-mail us.
Scorecards: Logistic, Ridge and Logic Regression
In the context of credit scoring, one tries to develop a predictive model
using a regression formula such as Y = Σ wi Ri,
where Y is the logarithm of odds ratio (fraud vs. non fraud). In
a different but related framework, we are dealing with a logistic regression where Y is binary, e.g. Y = 1 means
fraudulent transaction, Y = 0 means non fraudulent. The variables Ri, also referred to
as fraud rules, are
binary flags, e.g.
- high dollar amount transaction
- high risk country
- high risk merchant category
This is the first order model. The second order model involves cross products Ri x
Rj to correct
for rule interactions. The purpose of this question is to how best compute the regression coefficients
wi, also referred to as rule weights. The issue is that rules substantially overlap,
making the regression approach highly unstable. One approach consists of
constraining the weights, forcing them
to be binary (0/1) or to be of the same sign as the correlation between the associated rule and the
dependent variable Y. This approach is related to ridge regression.
We are wondering
what are the best solutions and software to handle this problem, given the fact that the variables are
binary.
Note that when the weights are binary, this is a typical
combinatorial optimization
problem. When
the weights are constrained to be linearly independent over the set of integer numbers, then each
Σ wi Ri (sometimes called unscaled score)
corresponds to one unique combination of rules. It also uniquely
represents a final node of the underlying decision tree defined by the rules.
Contributions:
|
|