plars
API Documenation
Recall that the objective of plars is to fit a polynomial \(P\) such that
\[y\approx P(x)\]
where \(y\) is a label while \(x\) is a vector of features.
1 (Instance | Fit) dictionaries
Before we detail the parameters of the plars call. Let us examine the simplest way to call for a fit of a plars model using the defatult parameters.
from mizopol.plars_api import fit
# Set the default parameters for the plars instance
dic_plars = dict()
# Set the default parameters for the fit method
dic_plars_fit = dict()
# run the fit method
sol, cpu = fit(X, y, dic_plars=dic_plars, dic_plars_fit=dic_plars_fit)As a matter of fact by providing empty dictionaries, the default values are used. Each time a eligible (key, value) pair is defined inside the dict() instructions, the provided values are used to replace the defaults ones.
Table 1 defines the possible (key, value) pairs that can be used in the setting of the dic_plars dictionary:
1.1 dic_plars dictionary’s arguments:
dic_args dictionary.
| Parameter | Type | Used for | Default |
|---|---|---|---|
deg |
int |
The degree of the polynomial to be identified | 1 |
window |
int |
number of samples per window (window width) | 200 |
nModels |
int |
Number of sampled window for alignement evaluation | 10 |
nModes |
int |
Number of selected monomials per window | 10 |
eps |
float |
precision for the final least squares solution | 5e-2 |
nBatch |
int |
Number of window used to determine monomials contributions | 25 |
eta |
float |
The quantile used to compute the error dataframe | 50 |
With regards to Table 1, the following comments are worth giving:
The maximum number of monomials is limited to
nModels * nModesby construction of the alorithm.Note that
etais expected to represent quantile value so it is generally taken among the following set of valueseta\(\in \{50, 80, 90, 95, 98, 99, 100\}\).While the
degparameters might be any integer, it is important to keep in mind that when the number of sensors is high, taking high values ofdegmight leads to an important computation time secause of the resulting unreasonably high of candidate monomials.while in the interface modules that are defined in the publicly available GitHub repository and which serves as an intermediary with the core modules developed locally (see the deployment figure in the previous section), the
dic_plarsand thedic_plars_fitare dictionary, the type of the entries is enforced in the distantfast-apithroughpydanticmodel’s declaration. This help returning meaninfull error message in case the type of the used parameter is not legal. For some variables, there are also bounds on the values of some parameters that, when violated, triggers a comprehensive error message.Increasing
nModelsincrease the chance to capture all the important modes (monomials) that contribute here and there in the dataset to contruct the correct vector of label. Therefore, by taking too small a value, there is a risk of skipping important correlations while taking it high, uselessly increases the computation time.As for
nModes, it is somehow linked to the presumed complexity of the model. It might be thought of once the two parametersnModelsandwindoware already chosen.The last comments regarding
nModelsandnModesmight suggest that the choice is quite difficult and need high level of expertise. As a matter of fact, default values perform quite good results in the majority of case and unless you take extreme values, the results are not so sensitive to this choice. A typical user experience consists in trying first the default value and then increase or decrease those values and see if it does make any significant change in the quality of the fit.
1.2 dic_plars_fit dictionary’s arguments:
dic_args_fit dictionary.
| Parameter | Type | Used for | Default |
|---|---|---|---|
th_monomial |
float |
Threshold to keep a candidate monomial | 1e-4 |
colNames |
list[str] |
the name attributed to the \(x\)-components for the creation of some resulting dataframe | None |
decouple |
boolean |
Whether to avoid updating the coefficients of the previously selected modes when new ones are selected | False |
compute_contributions |
boolean |
Whether to compute the contributions of monomial for later displaying | False |
nfeats |
int |
Maximum number of sensors to to involve in the solution | None |
With regards to Table 2, the following comments help better understanding the impact of the choices of the dictionary entries:
Notice that if
compute_contributionis set toFalsethe parameterth_monomialhas no effect. Indeed, in order to use this threshold, the contribution of the monomial have to be computed.The
colNamesis used to refer to the columns of the features matrixX. If the default valueNoneis used, the standard namesx1,x2, …xnare used. Therefore, giving meaningful name that talk to the end users which are familiar with the meaning of the sensors might be important when presenting the results.The
nfeatsparameters might be helpful when the number of sensors involved in the problem is really very high making the number of monomials in case of relatively highdegimpratically high. In such case, settingnfeatsto reasonable values forces the solver to first select the most important sensors before applying the polynomial transformation. Notice however that this selection process comes with a cost. So usingnfeatsdifferent from the defaultNoneshould be used only when necessary.
1.3 Using non default dictionaries
Based on the previous section, it is now possible to rewrite the script of Section 1 while using non default values for some of the entry parameters of the two dictionary:
from mizopol.plars_api import fit
# Set the default parameters for the plars instance
dic_plars = dict(deg=3, window=1000)
# Set the default parameters for the fit method
dic_plars_fit = dict(th_monomial=1e-3, compute_contributions=True)
# run the fit method
sol, cpu = fit(X, y, dic_plars=dic_plars, dic_plars_fit=dic_plars_fit)By so doing the corresponding default values are replaced by the ones provided by the user.
2 Fiting a plars model
2.1 Importing the fit method
The fit method can be imported via:
from mizopol.plars_api import fit 2.2 Input arguments
fit method of the plars_api module.
| Parameter | Type | Used for | Default |
|---|---|---|---|
X |
list[list[float]] |
The degree of the polynomial to be identified | user-defined |
y |
list[float] |
number of samples per window (window width) | user-defined |
dic_plars |
dict |
Number of sampled window for alignement evaluation | user-defined |
dic_plars_fit |
dic |
Number of selected monomials per window | defined |
Xandyarend.arraypython variables (matrix and vector respectively).for
dic_plarsanddic_plars_fitsee Table 1 and Table 2 of Section 1.All the input arguments of
fitare mandatory although the user might give the default dictionary as arguments. This choice has been made intentionally in order to remind the user of the existance of these dictionaries and that their default values are not necessarily the one to be used.
2.3 Example of use
import numpy as np
from mizopol.plars_api import predict, fit, monomials_contrib
nt = 20000
nx = 7
X = 1.0 + np.random.randn(nt, nx)
y = 12 * X[:, 0] + 10.3 * X[:, 1] * X[:, 2] - 2 * X[:, 1] ** 4 - 12.0
dic_plars = dict(deg=4, nModes=10, nModels=6, window=100, eps=5e-2)
# Comment the colNames argument | try nfeats = 5
dic_plars_fit = dict(
compute_contributions=False,
colNames=[f'S{i + 1}' for i in range(X.shape[1])],
nfeats=None,
)
sol, cpu = fit(X, y, dic_plars=dic_plars, dic_plars_fit=dic_plars_fit)In the following section, we dive more deeply in the returned arguments sol and cpu that are returned by the fit method.
2.4 Returned arguments
The fit method returns two arguments:
sol: A dictionary containing the solution and some corresponding fitting results that are detailed below.cpu: a tuple (cpu[0],cpu[1]) such thatcpu[0]: is the computation time from the user’s perspective. Namley, this includes the communication with the endpoint, potentially the time needed by the server to warm up the docker image and the time needed to serialize and send back the results.cpu[1]: is the computation time needed at the server side which gives a faithful information regarding the efficiency of the algorithm putting aside all the delays that are induced by the cloud and the distant deployment. This information might be interesting for the evaluation of local use of theMizoPolpackage which might be possible under certain circumstances.
As for the first argument sol, it is a dictionary whose content is detailed in the following table:
2.4.1 The sol dictionary
sol dictionary returned by the fit method of the mizopol.plars_api module.
| Parameter | Type | Description | |
|---|---|---|---|
nfeat |
int |
The number of candidate monomials before selection | |
indices |
list[int] |
The indices of the selected monomial among the polynomial-features generated ones | |
powers |
list[list[int]] |
The matrix of powers that defines the selected monomials | |
coefs |
list[float] |
The associated vector of coefficients | |
card |
int |
The cardinality of the solution (number of retained monomials) | |
error |
float |
The value of the eta quantile of the error |
|
cpu |
float |
The computation time (in sec) | |
cols |
list[str] |
columns names in accordance with the maatrix of powers | |
df_contrib |
pandas dataframe |
Dataframe showing the normalized contributions of the selected monomial in reconstructing the label (available only if compute_contributions is set ton True in the dic_plars_fit dictionary) |
|
df_sol |
pandas dataframe |
Summary datafreame showing a more detailed statistics of the monomial contribution together with their associated coefficients in the solutions (available only if compute_contributions is set ton True in the dic_plars_fit dictionary) |
|
dfe_train |
pandas dataframe |
Dataframe showing the percentiles of error | |
eta |
int |
Recalling the eta used to produce the normalized percentile of errors |
2.5 Example of returned results
So let us examine the returned sol and cpu resulted from our last script used in Section 2.3 by executing the following script:
for k, value in sol.items():
print(k)
print(value)
print('---')
print(f'cpu all = {cpu[0]:1.3} | cpu distant = {cpu[1]:1.3}')Below are the printed messages:
nfeat
330
---
indices
[0, 1, 204, 16]
---
powers
[[0, 4, 0, 0, 0, 0, 0], [0, 1, 1, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0]]
---
coefs
[-1.9999998958050205, 10.299963148684332, 11.999881943977648, -11.999786769721386]
---
card
4
---
error
3.671939849565719e-06
---
cpu
0.11779189109802246
---
cols
['S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7']
---
bias_to_add
0.0
---
df_contrib
Monomial Contribution std
0 (S2)^4 -0.333409 0.088181
1 (S2)(S3) 0.236548 0.029214
2 (S1) 0.232246 0.016285
3 1 -0.197798 0.000000
---
df_sol
S1 S2 S3 S4 S5 S6 S7 Contribution std coefs
0 0 4 0 0 0 0 0 -0.333409 0.088181 -2.000000
1 0 1 1 0 0 0 0 0.236548 0.029214 10.299963
2 1 0 0 0 0 0 0 0.232246 0.016285 11.999882
3 0 0 0 0 0 0 0 -0.197798 0.000000 -11.999787
---
dfe_train
Error
50% 0.000001
80% 0.000002
90% 0.000003
95% 0.000004
98% 0.000004
99% 0.000005
100% 0.000008
---
eta
953 Computing monomial contributions
Previously, it has been shown that when fitting a plars model on some training dataset with the field compute_contributions set to True in the dic_plars_fit dictionart, the contribution of the monomial retained in the solution is automatically computed.
Now given a fitted solution sol that is returned by the fit method, it might be useful to compute the contribution of the monomial contained in the solution in a new dataframe. Indeed, this might contain several kind of information:
If the contributions of monomials in the new data is far from their contribution in the training data, this might indicate a change in the context between the train and the new data.
Sometimes, when the residual of the relationship is higher over a period of time inside the new data, the computation of the change of the contribution of the monomials within the incriminated period might inform a lot about the kind of default that lies behind the increase in the residual.
3.1 Importing the monomials_contrib method
This is done using the monomials_contrib method that can be imported as using
from mizopol.plars_api import monomials_contrib3.2 Input arguments
monomials_contrib method of the plars_api module.
| Parameter | Type | Used for | Default |
|---|---|---|---|
df |
pandas dataframe |
The working dataframe | user-defined |
sol |
dict |
solution returned by the fit method |
user-defined |
win |
int |
The window used in evaluating the contribution by random sampling | 200 |
nBatch |
int |
Number of sampled window used in the evaluation | 25 |
df_contrib, (cpu1, cpu2) = monomials_contrib(df, sol, win=200, nBatch=25)3.3 Returned arguments
The monomials_contrib method returns:
df_contrib: a pandas dataframe taking the same form as the one contained in the fittedsol(provided that thecompute_contributionsfield is set toTruein thedic_plars_fitdictionary)The tuple
cpucontaining the computation time (user and distant cpu) as preciously explained in Section 2.4.
4 The predict method
Once a solution dictionary, say sol, is returned by the fit method, it can be used to predict the label for a given features matrix X. This is done by the predict method as it is shown in the following script:
from mizopol.plars_api import predict
ypred, (cpu1, cpu2) = predict(X, sol)where the returned arguments are quite self-explanatory.