g2sys

Detecting invariant relationships in the Nasa Turbofan Jet Engine Dataset.


1 Dataset

This use case involves the dataset provided by Nasa where run-to-failure history of degradation is captured. Four datasets are provided which correspond to different conditions.

2 Objective

Objective

The original objective of this Nasa Datasets is to estimate the Remaining Useful Life (RUL) which is not our objective in this usecase as we aim at showing how the sensors relate to each other and what are the invariant relationships that hold depsite of the aging so that one can only focus on non redundant information.

The following movie shows how some invariant relationships that are present in the nasa dateset can be rapidly obtained using the g2sys module.

For those who are interested in the pre-processing of the \(\texttt{kaggle}\) data set, the following section might be helpful.

3 Dataset preparation for g2sys

As the dataset labelled as train (in the Nasa repository) contains the whole lifecycle until final degradation, we stick to these datasets. The dataset labelled test are there for the RUL-related studies (see above).

The following script is used to create the four datasets. Notice that the sensors have been labeled more simply and that the first two columns are removed as they represents respectively the time and the cycle endpoints.

import pandas as pd 
import numpy as np 

for iexp in [1, 2, 3, 4]:

    df = pd.read_csv(f'train_FD00{iexp}.txt', sep=' ')

    df.reset_index(drop=True, inplace=True)
    df.columns = [f'c{i+1}' for i in range(len(df.columns))]

    fig = go.Figure()
    t = np.arange(0, len(df))
    i =np.random.randint(0,len(df.columns))
    fig.add_trace(go.Scatter(x=t, y=df.loc[:,f'c{i+1}'].values, name=f'c{i}'))

    df = df[[f'c{i+1}' for i in np.arange(2, len(df.columns)-2)]]
    df.to_csv(f'nasa_{iexp}.csv')
The dataset used for illustration

This leads to four datasets among which, only the second one, namely nasa_2.csv is used here for the sake of illustration.

4 Results & Discussion

4.1 System’s graph

The following graph of connexion between sensors is obtained (see the section) dedicated to the presentation of the graph of connexion between sensors for more details:

Figure 1: Graph representing the relationships as discovered using the g2sys module from the nasa_2.csv dataset. Notice that the nodes with thick boundaries refer to sensors indexing dynamic relationships1 while the other refer to sensors that can be represented through static raltionsips (expressing them as functions of the sets of sensors that send arrows to them).

Notice that as it is shown in the screenshot below, only 15% of the data is used for discovering the relationships while the residuals shown after are computed for the whole datasets.

Figure 2: Screenshot of the g2sys module showing the amount of training data (15%) used in the discovery of the relationships. As the successive portions of the dataset represents different turbofans recordings, The persistency of the smalleness of the residuals as shown hereafter witnesses in favour of the relevance of these invariant relationships over the whole set of engines.

4.2 Viewing residuals

Recall that the residual is expressed by the normalized expression:

\[ \dfrac{\texttt{percentile}(y-\hat y, 95)}{\texttt{median}(\vert y\vert )} \]

This means that given the plots below, 95% of the error are lower than 8% of the absolute value of the label’s median. As a matter of fact, except for two relationships, the relationships precision is such that 95% of the error are lower than 5% of the label’s median.

About the residual’s legend

The syntax of the residual legens is as follows \[ \texttt{sensor}|\texttt{card}|\texttt{deg}|\texttt{d}|\texttt{nd} \]

where

  • \(\texttt{sensor}\) is the sensor’s name.
  • \(\texttt{card}\) is the number of active monomials (number of active coefficients)
  • \(\texttt{d}\) is the elementary delay used in the solution2
  • \(\texttt{nd}\) is the the number of delays

Footnotes

  1. See the Overview section for a description of these concepts.↩︎

  2. If \(\texttt{d}=0\), the parameter \(\texttt{nd}\) has no effect.↩︎