qlars

API documentation

1 The `rlars` module’s objective

The objective of the rlars module of the MizoPol package is described in a previous dedicated section. Nevertheless, let us summarize it briefly for the reader’s convenience.

1.1 General case

In its more general form, the rlars enable to characterize the normality of a pair $(x,y)$ consisting in.a features vector $x$ and a label $y$ by the fact that the resisual defined by:

\[ R(x,y) = y-\sum_{k=0}^{n_m}c_k(x)y^k \approx 0 \tag{1}\]

or equivalently:

\[ R(x,y) = \sum_{k=0}^{n_m}\bar c_k(x)y^k \approx 0\quad\vert \quad \bar c_k := \left\{\begin{array}{ll} c_k & \text{if $k\neq 1$}\\ c_k-1& \text{if $k=1$} \end{array}\right. \tag{2}\]

which can be simply stated as follows:

rlarsnormality characterization

The normality is characterized by $y$ being a root of a polynomial whose coefficient are polynomial in $x$. rlars attempt to find such polynomial if possible.

1.2 Special case of rational representation

In the spectic case where the degree $n_m=1$, the rlars module delivers a rational expression of the label as a function of the vecteur of features $x$, namely:

\[ y = -\dfrac{\bar c_0(x)}{\bar c_1(x)} \tag{3}\]

where both $\bar c_0$ and $\bar c_1$ are multi-variate polynomials in potentially higher degree.

Meaning of $n_m=1$

Notice that $n_m=1$ in Equation 2 does not mean that the polynomials $\bar c_k(x)$ are of degree 1. It is the maximum power of $y$ that is equal to $1$ and not the degrees of the polynomials $\bar c_k(\cdot)$ that is concerned.

As such, this special case for its own is a generalization of the structure that is searched for by the plars module where $\bar c_1(\cdot) \equiv 1$.

2 The `fit` method

The good news is that the calling arguments for the rlars module are exactly the same as the ones used for the plars (see the dedicated section).

The following scripts create a rational function using a polynomial dedicated utilities which are available in the mizopol.utils module and then call rlars to fit a relationship:

from mizopol.utils_api import generate_pol, Polynomial
from mizopol.rlars_api import fit

nx = 3
N = 100000
X = np.random.randn(N, nx)

# Generate numerator and denominator polynomials
Pnum = generate_pol(nx, deg=3, nModes=4, intercept=True)
Pden = generate_pol(nx, deg=2, nModes=2, intercept=False)

# compute gamma to avoid division by zero
den = Pden(X)
gamma = 2 * abs(den.min())

# compute
y = Pnum(X) / (gamma + Pden(X))

dic = dict(deg=4, window=100, nModes=40, nModels=20, eps=1e-3, eta=90)

dic_fit = dict(
    colNames=[f's{i + 1}' for i in range(nx)],
    compute_contributions=True,
    nfeats=None,
    th_monomial=1e-4
)

sol, (cpu1, cpu2) = fit(X=X, y=y, dic_rlars=dic, dic_rlars_fit=dic_fit)

print(sol.keys())
print('dfe_train = \n', sol['dfe_train'])
print('------')
print('df_contrib = \n', sol['df_contrib'])
print('------')
print('df_sol = \n', sol['df_sol'])
print('------')
print('cardinality = \n', sol['card'])
print('------')
print(f'cpu all={cpu1:2.3f} | cpu-distant={cpu2:2.3f}')

Results

dict_keys(['nfeat', 'indices', 'powers', 'coefs', 'card', 'error', 'cpu', 'cols', 'bias_to_add', 'df_contrib', 'df_sol', 'dfe_train', 'eta', 'colNames', 'ymin', 'ymax'])
dfe_train = 
          Error
50%   0.000052
80%   0.000256
90%   0.000583
95%   0.001067
98%   0.002070
99%   0.002966
100%  0.015861
------
df_contrib = 
        Monomial  Contribution       std
0          (s2)     -0.357707  0.025013
1      (s1)(s2)      0.249449  0.021815
2    (s2)(s3)^2     -0.182015  0.029892
3        (s2)^3      0.170861  0.033580
4  (s1)(s3!*!y)     -0.029507  0.008883
5  (s2)(s2!*!y)      0.009927  0.001891
------
df_sol = 
    s1  s2  s3  s1!*!y  s2!*!y  s3!*!y  Contribution       std     coefs  y_powers
0   0   1   0       0       0       0     -0.357707  0.025013 -0.072478         0
1   1   1   0       0       0       0      0.249449  0.021815  0.061161         0
2   0   1   2       0       0       0     -0.182015  0.029892 -0.036096         0
3   0   3   0       0       0       0      0.170861  0.033580  0.016694         0
4   1   0   0       0       0       1     -0.029507  0.008883 -0.069995         1
5   0   1   0       0       1       0      0.009927  0.001891  0.014073         1
------
cardinality = 
 6
------
cpu all=0.681 | cpu-distant=0.536

Notice that the df_sol field of the returned solution sol informs about the degree $n_m$ of the polynomial (in $y$) which is obviously here equal to one. This means that we have a purely rational function that has been identified. This might be tigthly related to the th_monomial value that is taked quite large!.

3 The `predict` method

Once a solution sol is fitted using the fit method, it can be used to predict the residual corresponding to a new pair $(X,y)$ of features matrix $X$ and a label vector $y$.

R, residual, df_res, (cpu1, cpu2) = predict(X, y, sol=sol, eta=50)

print('df_res = \n', df_res)
print(f'cpu-total = {cpu1:2.3f} | cpu-distant = {cpu2:2.3f}')

Results

df_res = 
       per-Error
50%    0.000177
80%    0.000870
90%    0.001982
95%    0.003625
98%    0.007033
99%    0.010077
100%   0.053887
cpu-total = 1.093 | cpu-distant = 0.753

3.1 Ouput arguments of `predict`

Table 1: output arguments of the predict method of the rlars module.

Parameter	Type	Used for
`R`	`list[list[complex]]`	list of roots of the $y$ equation Equation 2
`residual`	`list[float]`	Residual of the $y$ equation Equation 2 at $y$
`df_res`	`pandas dataframe`	Normalized residual dataframe

1 The rlars module’s objective

1.1 General case

1.2 Special case of rational representation

2 The fit method

3 The predict method

3.1 Ouput arguments of predict

1 The `rlars` module’s objective

2 The `fit` method

3 The `predict` method

3.1 Ouput arguments of `predict`