Approximate Solutions to Linear Regression Problems
Here we assume that we have a first order solution to a regression problem, in the form
Y = Σ wi Ri,
where Y is the response, wi are the regression coefficients, and Ri
are the independent variables. The number of variables is very high, and the independent variables are highly correlated.
We want to improve the model by considering a second order regression of the form
Y = Σ wi Ri + Σ wij cij mij Ri Rj ,
where
- cij = correlation between Ri and Rj
- wij = |wiwj|0.5 x sgn(wiwj)
- mij are arbitrary constants
In practice, some of the Ris are highly correlated and grouped into clusters. These clusters can be identified
by using a clustering algorithm on the cijs. For example, one could think of a model with two clusters A and B such as
Y = Σ wi Ri + mA ΣA wij cij Ri Rj
+ mB ΣB wij cij Ri Rj
where
- ΣA (resp. ΣB) are taken over all i < j belonging to A (resp. B)
- mij = mA (constant) if i, j belong to cluster A
- mij = mB (constant) if i, j belong to cluster B
An interesting case occurs when the cluster structure is
so strong that
- |cij| = 1 if i and j belong to the same cluster (either A or B)
- cij = 0 otherwise
This particular case results in
mA = 4 / [1 + (1+8kA)0.5]
mB = 4 / [1 + (1+8kB)0.5]
where
kA= ΣA |cij| and kB= ΣB |cij|.
Question
If the cluster structure is moderately strong, with the correlations cij close to 1, -1 or 0, how accurate is the above formula
involving kA and kB? Here we assume that the wis are known or approximated. Typically, wi is a constant or
wi is a simple function of the correlation between Y and Ri.
Alternate Approach
Let us consider a simplified model involving one cluster, with mij = constant = m. For instance, the unique cluster could
consist of all variables i, j with |cij| > 0.70. The model can be written as
Y = Σ wi Ri + m Σ wij cij Ri Rj.
We want to find m that provides the best improvement over the first order model, in terms of residual error. The first order model corresponds to
m = 0.
Let us introduce the following notations:
- W = Σ wij cij Ri Rj,
- V = W - u, where u = average(W) (Thus V is the centered W, with mean 0),
- S= Σ wi Ri. (average(S) = average(Y) by construction)
Without loss of generality, let us consider the slightly modified (centered) model
Y = S + m V.
Then
m = [ Transposed(V) x (Y-S) ] / [ Transposed(V) x V ],
where
- Y, S, and V are vectors with n rows,
- n is the number of observations.
Further Improvements
The alternate approach could be incorporated in an iterative algorithm, where at each step a new cluster is added. So at each step we would have the same computation for m, optimizing the residual error on
Y = S + m V.
However this time, S would contain all the clusters detected during the previous step, and V would contain the new cluster
being added to the model.

|