plars
Detailed analysis of a use-case
Recall that the exhaustive and precise documentation of the API is proposed in the API-documentation section and those which follow it. Here, only a use-case is presented in order to grasp a feeling of the capabilities of the MizoPol package.
1 The problem
In order to better understand the parameters involved in the plars instantiation and use, it is worth working on a specific illustrative example so that the effect of changing each parameter value can be easily observed and explained.
So let us consider the following script that defines a dataset that is associated to a known polynomial so that we can examine how the plars is able to recover the hidden truth from the pair \((X,y)\) of features matrix and label vector.
So let us consider the relationship defined by:
\[ y = x_0^2-30x_1x_3^3+4x_5^5-1 \tag{1}\]
2 The settings
We shall consider three different settings in order to illustrate some of the capabilities of plars in orienting the solution of the problem, namely:
In this setting, we use the exact number of variable \(n_x=6\) involved in Equation 1. Morover, we instantiate the solver with a slightly higher degree than the unknown hidden one involved in Equation 1, namely deg=6 instead of \(5\).
Notice that in this case, the number of eligible monomials is equal to 924.
In this setting, we increase the number of variable \(n_x=9\). Morover, we instantiate the solver with a higher degree than the unknown hidden one involved in Equation 1, namely deg=7.
Notice that in this case, the number of eligible monomials is equal to 11440.
We reuse the previous setting but we ask plars to select only nfeats=4 among the \(n_x=9\) variables to be involved in the polynomial expansion.
Notice how this induces a reduction in the computation time. Moreover, notice that the nfeat attribute of the solution is still computed based on \(n_x=9\) while in fact, internally the truly used number of variables is nfeats=4 which explains the reduction in the computation time.
3 The results
import numpy as np
from mizopol.plars_api import fit
# create the data (X,y)
nx = 6
nt = 100000
X = np.random.rand(nt, nx)
y = X[:,0]**2 -30*X[:,1] * X[:,3]**3 + 4 * X[:,5]**5-1
# call and fit parameters
dic_plars = dict(window=500, deg=6, nModels=5, nModes=10, eps=1e-2)
dic_plars_fit = dict(compute_contributions=True)
# solve the problem
sol, cpu = fit(X, y, dic_plars=dic_plars, dic_plars_fit=dic_plars_fit)
# print the results
print('number of eligible parameters', sol['nfeat'])
print(sol['dfe_train'])
print(sol['card'])
print(sol['df_sol'])
print(f'cpu = {sol["cpu"]:3.2} sec')which results in
number of eligible parameters 924
Error
50% 0.003069
80% 0.005694
90% 0.006973
95% 0.008930
98% 0.010817
99% 0.011353
100% 0.012464
13
x0 x1 x2 x3 x4 x5 Contribution std coefs
0 0 1 0 3 0 0 -0.522762 0.038242 -29.997015
1 0 0 0 0 0 0 -0.136881 0.000000 -0.977973
2 0 0 0 0 0 5 0.092488 0.006196 3.948401
3 5 0 0 0 0 0 -0.088545 0.005112 -3.756293
4 3 0 0 0 0 0 0.084649 0.003616 2.447522
5 6 0 0 0 0 0 0.045957 0.003128 2.288520
6 3 0 0 0 0 2 -0.007943 0.000618 -0.680418
7 2 0 0 0 0 3 0.007027 0.000576 0.609813
8 4 0 0 0 0 2 0.006844 0.000667 0.749974
9 3 0 0 0 0 3 -0.005838 0.000494 -0.651726
10 0 0 0 0 0 6 0.000618 0.000035 0.031106
11 0 0 0 1 0 0 -0.000247 0.000005 -0.003558
12 2 0 0 1 0 0 0.000152 0.000008 0.006449
cpu = 0.3 sec
import numpy as np
from mizopol.plars_api import fit
# create the data (X,y)
nx = 9
nt = 100000
X = np.random.rand(nt, nx)
y = X[:,0]**2 -30*X[:,1] * X[:,3]**3 + 4 * X[:,5]**5-1
# call and fit parameters
dic_plars = dict(window=500, deg=7, nModels=5, nModes=10, eps=1e-2)
dic_plars_fit = dict(compute_contributions=True)
# solve the problem
sol, cpu = fit(X, y, dic_plars=dic_plars, dic_plars_fit=dic_plars_fit)
# print the results
print('number of eligible parameters', sol['nfeat'])
print(sol['dfe_train'])
print(sol['card'])
print(sol['df_sol'])
print(f'cpu = {sol["cpu"]:3.2} sec')which results in
number of eligible parameters 11440
Error
50% 0.000585
80% 0.000912
90% 0.001099
95% 0.001274
98% 0.001481
99% 0.001661
100% 0.003145
11
x0 x1 x2 x3 x4 x5 x6 x7 x8 Contribution std coefs
0 0 1 0 3 0 0 0 0 0 -0.630226 4.183679e-02 -29.994346
1 0 0 0 0 0 0 0 0 0 -0.166666 1.853559e-17 -0.998278
2 0 0 0 0 0 5 0 0 0 0.072542 4.441163e-03 2.634459
3 2 0 0 0 0 0 0 0 0 0.055356 2.311301e-03 0.985651
4 0 0 0 0 0 4 0 0 0 0.046204 2.754222e-03 1.379038
5 0 0 0 0 0 3 0 0 0 -0.015762 6.486132e-04 -0.385597
6 0 0 0 0 0 7 0 0 0 0.008070 7.056783e-04 0.375179
7 3 0 0 0 0 0 0 0 0 0.002117 1.182423e-04 0.050505
8 4 0 0 0 0 0 0 0 0 -0.002028 1.001263e-04 -0.061360
9 5 0 0 0 0 0 0 0 0 0.000697 4.547199e-05 0.025390
10 0 1 0 1 0 0 0 0 0 -0.000191 7.361118e-06 -0.004540
cpu = 0.93 sec
import numpy as np
from mizopol.plars_api import fit
# create the data (X,y)
nx = 9
nt = 100000
X = np.random.rand(nt, nx)
y = X[:,0]**2 -30*X[:,1] * X[:,3]**3 + 4 * X[:,5]**5-1
# call and fit parameters
dic_plars = dict(window=500, deg=7, nModels=5, nModes=10, eps=1e-2)
dic_plars_fit = dict(compute_contributions=True, nfeats=4)
# solve the problem
sol, cpu = fit(X, y, dic_plars=dic_plars, dic_plars_fit=dic_plars_fit)
# print the results
print('number of eligible parameters', sol['nfeat'])
print(sol['dfe_train'])
print(sol['card'])
print(sol['df_sol'])
print(f'cpu = {sol["cpu"]:3.2} sec')which results in
number of eligible parameters 330
Error
50% 0.000446
80% 0.000795
90% 0.001035
95% 0.001326
98% 0.001644
99% 0.001769
100% 0.002796
12
x3 x1 x5 x0 Contribution std coefs
0 3 1 0 0 -0.652083 0.050207 -29.998752
1 0 0 0 0 -0.172005 0.000000 -1.001214
2 0 0 5 0 0.106215 0.006788 3.722983
3 0 0 0 2 0.056125 0.002432 0.984052
4 0 0 4 0 0.005643 0.000364 0.164724
5 0 0 0 3 0.002525 0.000092 0.058017
6 0 0 7 0 0.002488 0.000244 0.117669
7 0 0 0 4 -0.001779 0.000088 -0.051696
8 0 0 0 6 0.000367 0.000022 0.014886
9 0 0 2 3 -0.000363 0.000032 -0.025200
10 0 0 3 4 0.000195 0.000017 0.023137
11 0 0 0 5 -0.000111 0.000007 -0.003900
cpu = 0.23 sec
In the next section, a simple GUI enables to smoothly using the plars algorithm by simply uploading the dataframes is described.