Data Mining, Quant, Statistics, Computer Science: Jobs, Resumes, Directory

Precision Recruiting

Data Mining

Contest

Math Jobs

Site Map

[ Home ]

[ Finance ]

[ Web Audit ]

[ Consulting ]

These are difficult mathematical questions. They are arising from real applications such as fraud detection, arbitrage and scoring systems. If you have interesting answers to any questions, feel free to email us your comments or solution. The best answers will be published here. Companies and Organizations interested in submitting problems should E-mail us.

Scorecards: Logistic, Ridge and Logic Regression

In the context of credit scoring, one tries to develop a predictive model using a regression formula such as Y = Σ w_i R_i, where Y is the logarithm of odds ratio (fraud vs. non fraud). In a different but related framework, we are dealing with a logistic regression where Y is binary, e.g. Y = 1 means fraudulent transaction, Y = 0 means non fraudulent. The variables R_i, also referred to as fraud rules, are binary flags, e.g.

high dollar amount transaction
high risk country
high risk merchant category

This is the first order model. The second order model involves cross products R_i x R_j to correct for rule interactions. The purpose of this question is to how best compute the regression coefficients w_i, also referred to as rule weights. The issue is that rules substantially overlap, making the regression approach highly unstable. One approach consists of constraining the weights, forcing them to be binary (0/1) or to be of the same sign as the correlation between the associated rule and the dependent variable Y. This approach is related to ridge regression. We are wondering what are the best solutions and software to handle this problem, given the fact that the variables are binary.
Note that when the weights are binary, this is a typical combinatorial optimization problem. When the weights are constrained to be linearly independent over the set of integer numbers, then each Σ w_i R_i (sometimes called unscaled score) corresponds to one unique combination of rules. It also uniquely represents a final node of the underlying decision tree defined by the rules.
Contributions:

From Mark Hansen:
When the rules are binary, the problem is known as logic regression.

Data Mining • Machine Learning • Analytics • Quant • Statistics • Econometrics • Biostatistics • Web Analytics • Business Intelligence • Risk Management • Operations Research • AI • Predictive Modeling • Actuarial Sciences • Statistical Programming • Customer Insight • Data Modeling • Competitive Intelligence • Market Research • Information Retrieval • Computer Science • Retail Analytics • Healthcare Analytics • ROI Optimization • Design Of Experiments • Scoring Models • Six Sigma • SAS • Splus • SAP • ETL • SPSS • CRM • Cloud Computing • Electrical Engineering • Fraud Detection • Marketing Databases • Data Analysis • Decision Science • Text Mining