How to use?¶
Note that pyWATTS is no longer actively maintained, however the main graph pipeline functionality of pyWATTS has been integrated into the open source python package [sktime](https://www.sktime.net/en/stable/).**
Where should pyWATTS be used?¶
pyWATTS is not a machine learning framework or library. Instead, pyWATTS’ aim is to support users in performing data analysis tasks and hereby to avoid a script nightmare. For this reason, pyWATTS is designed to automate frequently occurring tasks and allows to define the sequence of necessary step in one pipeline.
How to use the modules?¶
Modules in pyWATTS contain the functionality to perform data analysis tasks with pyWATTS.
How to use pyWATTS?¶
Split your workflow in separate steps. For example, filtering of columns, scaling of values, aggregating multiple timesteps, using a SKLearn estimator, and evaluating the result with the RMSE.
Create a pipeline object. Thereby, you can specify the
path: The path where the plots and csvs of the pipeline should be saved.
batch: The timedelta that specifies the interval at which data should be processed. Note that this is currently only possible for subpipelines.
For each step, create a module that contains the functionality. This functionality is either
implemented in a predefined module,
implemented in your own module (see below)
a wrapped functions in a FunctionModule,
a wrapped SKLearn model in a SKLearnWrapper,
etc.
All of these approaches have their own strengths and weaknesses. The following table gives you an overview of which approach should be used when.
Approach
Use Case
Predefined Module
If there exists already a predefined module for the specified step
Own Module
If there is no module that meets your requirements and you will need this functionality frequently.
Function Module
If there is no module that meets your requirements and you will probably need this functionality only once, the functionality is very simple, or you simply want to try out the functionality.
SKLearnWrapper
If you want to use an existing SKLearn module
KerasWrapper or pyTorchWrapper
If you want to use a Keras or pyTorch model Note that you can write an own module for wrapping Deep Learning Models.
Add each module to the pipeline by calling the module object with:
inputs: Inputs can be the following:
the pipeline, if there is no previous step
StepInformation or a list of StepInformation if the module has a previous step, and the previous step has to provide a result
a tuple of StepInformation or a list of tuples of StepInformation if only one step in a tuple should has to provide a result. If multiple modules provide an output, the first one is chosen.
targets: The step against which the module should be fitted
computation_mode: Train, Transform, FitTransform
to_csv: To create a csv file with the used data
plot: To plot the used data
batch_size: Chunk of past data that should be used for fitting. [Necessary for Online Learning]
condition: Condition that describes to execute this module only if the condition is True
train_if: Train only if the condition is True
use_prob_transform
use_inverse_transform
Calling a module will return a StepInformation, which can be used as input for the next step. If the added module has no successors, you can ignore the returned StepInformation.
Call pipeline.run(). This starts the pipeline and executes all added modules in the defined order.
Optionally, load and save your whole pipeline by calling from_folder or to_folder. Both methods take as argument the path where the pipeline should be saved or from which the pipeline should be loaded.
The to_folder method will create a JSON containing the structure of the pipeline. Moreover, this JSON contains the parameters of all modules and information about the steps. Furthermore, in the folder specified by the path argument, modules might create further data. For example, the SKLearnWrapper creates pickle files and the KerasWrapper creates h5 files.
How to write an own modules?¶
The simplest way of writing own modules is to make use of the template.py. Nevertheless, you normally have to perform the following steps:
Decide whether your module is a Transformer or Estimator. Accordingly, your module has to inherit from BaseTransformer or BaseEstimator. The difference between both is that the Estimator has a fit method in contrast to the Transformer.
Implement the following methods:
__init__: method to create the module and its specific parameters
transform: executes the transformation defined in the module. Note that the first dimension of the datavariable has to be a time dimension since we work with time series.
get_params: method to get the module specific parameters
set_params: method set the module specific parameters
fit: method to fit the model if you module is an estimator
save and load: method to save and load if your module has to save and load more than the parameters that are returned by get_params. Moreover, if you save your module, you have to use the file_manager to get a path where you can save your module. The pipeline will pass a filemanager to the save function.
Please note that some util methods exist in pywatts.utils. These methods are often used by modules to handle xarray datasets.
After completing these two steps, you can use your created module in your pipeline.
# How can I get results? Currently, there are four possibilities to get the results of a pyWATTS pipeline.
Pipeline.train and pipeline.test returns a xarray Dataset and a summary string:
The dataset contains the results of all steps without successing steps.
The summary string contains all information of summary modules and the training
If you have a step information, you get its buffer with:
step_information.step.buffer
With callbacks you can perform operations on the output of steps including, writing them into a file.
You can use summary modules for example for calculating metrics. The results are the saved in summardy.md which is placed in the pipelines results folder.