pywatts.modules.preprocessing package

Submodules

pywatts.modules.preprocessing.average module

class pywatts.modules.preprocessing.average.Average(weights: List[float] = None, name: str = 'Average')

Bases: pywatts.core.base.BaseTransformer

Aggregation step to average the given time series, ether by simple or weighted averaging. By default simple averaging is applied.

get_params() → Dict[str, object]

Get parameters for the Average object. :return: Parameters as dict object. :rtype: Dict[str, object]

set_params(weights: List[float] = None)

Set or change Average object parameters. :param weights: List of individual weights of each given time series for weighted averaging. :type weights: list, optional

transform(**kwargs) → xarray.core.dataarray.DataArray

Aggregate the given time data_array by simple or weighted averaging. :return: xarray DataArray aggregated by simple or weighted averaging. :rtype: xr.DataArray

pywatts.modules.preprocessing.change_direction module

class pywatts.modules.preprocessing.change_direction.ChangeDirection(name='change_direction')

Bases: pywatts.core.base.BaseTransformer

This module calculates a time series that indicates whether the next value is higher, lower, or the same.

Parameters:name (str) – The name of the ChangeDirection module
get_params() → Dict[str, object]

Returns a list of all parameters (note that this module has no parameters)

Returns:List of parameters
Return type:Dict
set_params()

Sets the parameters (note that this module has no parameters)

transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Transforms the time series in a time series that indicates whether the next value is higher, lower, or the same

Parameters:x (xr.DataArray, optional) – The time series that should be transformed
Returns:A time series, where 1 indicates that the next value is higher, -1 that the next value

is lower, and 0 that the next value is the same :rtype: xr.DataArray :raises WrongParameterException: If not all indexes are part of x

pywatts.modules.preprocessing.clock_shift module

class pywatts.modules.preprocessing.clock_shift.ClockShift(lag: int, name: str = 'ClockShift', indexes: List[str] = None)

Bases: pywatts.core.base.BaseTransformer

This module shifts the data by a certain offset.

Parameters:lag – The offset for shifting the time series. Please note: The relative time of the shift is determined

by the current temporal resolution of the arrays in the pipeline. :type lag: int :param name: The name of the shift module :type name: str :param indexes: The list of indexes that determine the dimension in which the time should be shifted. If the list is None or empty, the time is shifted in all temporal dimensions. :type indexes: List

get_min_data()
get_params() → Dict[str, object]

Returns a list of all defined parameters

Returns:List of parameters
Return type:Dict
set_params(lag: int = None, indexes: List[str] = None)

Sets parameters for clock shifting

Parameters:lag – The offset for shifting the time series. Please note: The relative time of the shift is determined

by the current temporal resolution of the arrays in the pipeline. :type lag: int :param indexes: The list of indexes that determine the dimension in which the time should be shifted. If the list is None or empty, the time is shifted in all temporal dimensions. :type indexes: List

transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Shifts the given time series x by the defined lag

Parameters:x (xr.DataArray) – the time series to be shifted
Returns:The shifted time series
Return type:xr.DataArray
Raises:WrongParameterException – If not all indexes are part of x

pywatts.modules.preprocessing.custom_scaler module

class pywatts.modules.preprocessing.custom_scaler.CustomScaler(multiplier: float = 1.0, bias: float = 0.0, name: str = 'CustomScaler')

Bases: pywatts.core.base.BaseTransformer

Scaling step to scale a time series individually by a multiplier and a bias. By default the scaling does not affect the time series, i.e., the multiplier is 1.0 and the bias 0.0.

get_params() → Dict[str, object]

Get parameters for the CustomScaler object. :return: Parameters as dict object. :rtype: Dict[str, object]

inverse_transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Apply the inverse scaling to xarray dataset. :param x: xarray DataArray to apply differentiation on. :type x: xr.DataArray :return: Xarray dataset containing the n-th order differentiations. :rtype: xr.DataArray

set_params(multiplier: float = None, bias: float = None)

Set or change CustomScaler object parameters. :param multiplier: Value that is multiplied to every value in the time series. :type multiplier: float, optional :param bias: Value that is added to every value in the time series :type bias: float, optional

transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Apply the scaling to xarray dataset. :param x: xarray DataArray to apply scaling on. :type x: xr.DataArray :return: Xarray dataset scaled according to the specified multiplier and bias. :rtype: xr.DataArray

validate_multiplier(multiplier)

pywatts.modules.preprocessing.differentiate module

class pywatts.modules.preprocessing.differentiate.Differentiate(target_index: Union[str, List[str]] = None, name: str = 'Differentiate', n: Union[int, List[int]] = 1, axis: int = -1, pad: bool = False, pad_args: Dict[str, object] = {})

Bases: pywatts.core.base.BaseTransformer

Differentiation step to calculate the n-th order difference of a time series. By default the difference has not the same size as the input time series but padding is implemented by np.pad and specific arguments can be passed by pad_args.

get_params() → Dict[str, object]

Get parameters for Differentiate object.

Returns:Parameters as dict object.
Return type:Dict[str, object]
set_params(target_index: Union[str, List[str], None] = None, n: Union[int, List[int], None] = None, axis: Optional[int] = None, pad: Optional[bool] = None, pad_args: Optional[Dict[str, object]] = None)

Set or change Differentiate object parameters.

Parameters:
  • target_index (Optional[Union[str, List[str]]], optional) – Targed index or indizes for the xarray input to calculate difference for.
  • n (Optional[Union[int, List[int]]], optional) – N-th order difference specification (default 1). Could also be an array if multiple differences should be calculated.
  • axis (Optional[int], optional) – Axis to calculate the difference of (default -1 [last axis]).
  • pad (Optional[bool], optional) – Enable or disable padding (default disabled).
  • pad_args (Optional[Dict[str, object]], optional) – Padding arguments for np.pad method (default zero leading padding).
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Add n-th order differentiate to xarray dataset.

Parameters:x (xr.DataArray) – Xarray dataset to apply differentiation on.
Returns:Xarray dataset containing the n-th order differentiations.
Return type:xr.DataArray

pywatts.modules.preprocessing.linear_interpolation module

class pywatts.modules.preprocessing.linear_interpolation.LinearInterpolater(name: str = 'LinearInterpolater', method: str = 'linear', dim: str = 'time', fill_value: str = 'extrapolate')

Bases: pywatts.core.base.BaseTransformer

This module creates a linear interpolator.

Parameters:
  • name (str) – Name of the linear interpolator
  • method (str) – The method used for interpolation (e.g. linear)
  • dim (str) – The dimension used
  • fill_value – Handling of missing values (see

https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html) :type fill_value: str

get_params() → Dict[str, object]

Returns a list of parameters used for the linear interpolation

Returns:Parameters set for the linear interpolation
Return type:Dict
set_params(method: str = None, dim: str = None, fill_value: str = None)

Sets the parameters for the linear interpolation

Parameters:
  • method (str) – The method used for interpolation (e.g. linear)
  • dim (str) – The dimension used
  • fill_value – Handling of missing values (see

https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html) :type fill_value: str

transform(x=<class 'xarray.core.dataarray.DataArray'>) → xarray.core.dataarray.DataArray

Transforms the input

Parameters:x (xr.DataArray) – Input xarray dataset
Returns:Interpolated dataset
Return type:xr.DataArray

pywatts.modules.preprocessing.missing_value_detection module

class pywatts.modules.preprocessing.missing_value_detection.MissingValueDetector(name: str = 'missingValueDetector')

Bases: pywatts.core.base.BaseTransformer

Module to detect missing values (NaN, NA)

get_params() → Dict[str, object]

Get params

set_params()

Set params

transform(dataset: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Detects the indexes that correspond to the input having missing values

Parameters:dataset (xr.DataArray) – Dataset in which missing values should be detected
Returns:Returns a dataset with binary values, true if value is missing and false otherwise
Return type:xr.DataArray

pywatts.modules.preprocessing.resampler module

class pywatts.modules.preprocessing.resampler.Resampler(name: str = 'Resampler', time_index: str = 'time', target_time: str = '1H', method: str = 'mean', method_args: Optional[Dict[str, Any]] = None)

Bases: pywatts.core.base.BaseTransformer

Module to resample time series based data to a given target time (both up and down sampling). All methods given by pandas’ resample method are provided because of xarray’s data set structure. See http://xarray.pydata.org/en/stable/generated/xarray.Dataset.resample.html for more details.

This class resamples time series data based on xarray’s resample method which is in turn based on pandas’ resampling implementation.

Parameters:
  • name (str) – Name of this processing step (default: “Resampler”).
  • time_index (str) – Index of the dataset specifying the time series to be resampled (default: “time”).
  • target_time (str) – Target time after the resampling given in string datetime format (default: “1H”). For example, “6H”/”6h” for 6 hours, “s”/”S” for seconds, “m”/”M” for months.
  • method (str) – Method to use for down- or upsampling the data (default: “mean”). For example, “mean”, “min”, “sum”, “median”, “reduce”, “map”. http://xarray.pydata.org/en/stable/generated/xarray.core.resample.DatasetResample.html
  • method_args (Optional[Dict[str, Any]]) – Optional parameters for the selected method as a dict (default: “None”). Note: Some methods like reduce or map require parameters!

Example

# downsample dataset to 30 Minutes (1800s) by using mean method Resampler(target_time=”1800s”, method=”mean”)

# downsample dataset to 1 day by summing up all data for one day Resampler(target_time=”1d”, method=”sum”)

# upsample “time_series” index of the dataset to 1 Minute by using interpolate method Resampler(time_index=”time_series”, target_time=”60s”, method=”interpolate”)

# resample dataset index to 1 hour by using some costum (in this case first element) Resampler(target_time=”1h”, method=”map”, method_args={“func”: lambda x: x.mean()})

get_min_data()
get_params() → Dict[str, object]

Get parameter for this object as dict.

Returns:Object parameters as json dict
set_params(time_index: Optional[str] = None, target_time: Optional[str] = None, method: Optional[str] = None, method_args: Optional[Dict[str, Any]] = None)

Set parameter for this object.

Parameters:
  • time_index (Optional[str]) – Index of the dataset specifying the time series to resample (default: “time”).
  • target_time (Optional[str]) – Target time after the resampling given in string datetime format (default: “1H”). For example, “6H”/”6h” for 6 hours, “s”/”S” for seconds, “m”/”M” for months
  • method (Optional[str]) – Method to use for down- or upsampling the data (default: “mean”). For example, “mean”, “min”, “sum”, “meadian”, “reduce”, “map”. http://xarray.pydata.org/en/stable/generated/xarray.core.resample.DatasetResample.html
  • method_args (Optional[Dict[str, Any]]) – Optional parameters for the selected method as a dict (default: “None”). Note: Some methods like reduce or map require parameters!
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Resamples the dataset x as specified in the constructor.

Parameters:x (xr.DataArray) – dataset which should be resampled.
Returns:Resampled xarray dataset as xarray dataset.
Return type:xr.DataArray

pywatts.modules.preprocessing.sampler module

class pywatts.modules.preprocessing.sampler.Sampler(sample_size: int, name: str = 'SampleModule', indexes: List[str] = None)

Bases: pywatts.core.base.BaseTransformer

This module creates samples with a size specified by sample_size. I.e., if sample_size is 24h. It creates for each timestamp a vector containing all values of the past 24 hours. E.g., this module is useful if it forecasting algorithms needs the values of the past 24 hours as input.

Parameters:
  • sample_size (int) – The offset for shifting the time series
  • indexes (List[str]) – The indexes which should be shifted through time
get_min_data()
get_params() → Dict[str, object]

Returns a list of all hyperparameters/ user defined parameters

Returns:Dict with params
Return type:dict
set_params(sample_size: int = None, indexes: List[str] = None)

Set params.

Parameters:
  • sample_size (int) – The offset for shifting the time series
  • indexes (List[str]) – The indexes which should be shifted through time
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Sample the given time series x by the lag.

Parameters:x (xr.DataArray) – the input
Returns:A shifted time series.
Return type:xr.DataArray

pywatts.modules.preprocessing.slicer module

class pywatts.modules.preprocessing.slicer.Slicer(start: Optional[int] = None, end: Optional[int] = None, name: str = 'Slicer')

Bases: pywatts.core.base.BaseTransformer

This module slices the input data array starting from the start index up to the end index. Similar to the numpy array slicing, where we can filter an array with a[start:end].

Parameters:
  • start (int, optional) – Start index of the slicing operation, defaults to None
  • end (int, optional) – End index of the slicing operation, defaults to None
get_params() → Dict[str, object]

Return dict of module parameters.

Returns:Dict of module parameters.
Return type:Dict[str, object]
set_params(start: Optional[int] = None, end: Optional[int] = None)

Set module parameters.

Parameters:
  • start (int, optional) – Start index of the slicing operation, defaults to None
  • end (int, optional) – End index of the slicing operation, defaults to None
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Perform the slicing operation on the input array.

Parameters:x (xr.DataArray) – Input array which should be sliced.
Returns:Sliced array like in numpy a[start:end].
Return type:xr.DataArray

Module contents