pywatts.modules.preprocessing package

Submodules

pywatts.modules.preprocessing.average module

class pywatts.modules.preprocessing.average.Average(weights: List[float] = None, name: str = 'Average')

Bases: pywatts_pipeline.core.transformer.base.BaseTransformer

Aggregation step to average the given time series, ether by simple or weighted averaging. By default simple averaging is applied.

transform(**kwargs) → xarray.core.dataarray.DataArray

Aggregate the given time data_array by simple or weighted averaging. :return: xarray DataArray aggregated by simple or weighted averaging. :rtype: xr.DataArray

pywatts.modules.preprocessing.change_direction module

class pywatts.modules.preprocessing.change_direction.ChangeDirection(name='change_direction')

Bases: pywatts_pipeline.core.transformer.base.BaseTransformer

This module calculates a time series that indicates whether the next value is higher, lower, or the same.

Parameters:name (str) – The name of the ChangeDirection module
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Transforms the time series in a time series that indicates whether the next value is higher, lower, or the same

Parameters:x (xr.DataArray, optional) – The time series that should be transformed
Returns:A time series, where 1 indicates that the next value is higher, -1 that the next value

is lower, and 0 that the next value is the same :rtype: xr.DataArray :raises WrongParameterException: If not all indexes are part of x

pywatts.modules.preprocessing.clock_shift module

class pywatts.modules.preprocessing.clock_shift.ClockShift(lag: int, name: str = 'ClockShift', indexes: List[str] = None)

Bases: pywatts_pipeline.core.transformer.base.BaseTransformer

This module shifts the data by a certain offset.

Parameters:lag – The offset for shifting the time series. Please note: The relative time of the shift is determined

by the current temporal resolution of the arrays in the pipeline. :type lag: int :param name: The name of the shift module :type name: str :param indexes: The list of indexes that determine the dimension in which the time should be shifted. If the list is None or empty, the time is shifted in all temporal dimensions. :type indexes: List

get_min_data()

Returns how much data are at least needed by that transformer

transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Shifts the given time series x by the defined lag

Parameters:x (xr.DataArray) – the time series to be shifted
Returns:The shifted time series
Return type:xr.DataArray
Raises:WrongParameterException – If not all indexes are part of x

pywatts.modules.preprocessing.custom_scaler module

class pywatts.modules.preprocessing.custom_scaler.CustomScaler(multiplier: float = 1.0, bias: float = 0.0, name: str = 'CustomScaler')

Bases: pywatts_pipeline.core.transformer.base.BaseTransformer

Scaling step to scale a time series individually by a multiplier and a bias. By default the scaling does not affect the time series, i.e., the multiplier is 1.0 and the bias 0.0.

inverse_transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Apply the inverse scaling to xarray dataset. :param x: xarray DataArray to apply differentiation on. :type x: xr.DataArray :return: Xarray dataset containing the n-th order differentiations. :rtype: xr.DataArray

set_params(multiplier: float = None, **kwargs)

Set or change CustomScaler object parameters. :param multiplier: Value that is multiplied to every value in the time series. :type multiplier: float, optional :param bias: Value that is added to every value in the time series :type bias: float, optional

transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Apply the scaling to xarray dataset. :param x: xarray DataArray to apply scaling on. :type x: xr.DataArray :return: Xarray dataset scaled according to the specified multiplier and bias. :rtype: xr.DataArray

validate_multiplier(multiplier)

pywatts.modules.preprocessing.differentiate module

class pywatts.modules.preprocessing.differentiate.Differentiate(target_index: Union[str, List[str]] = None, name: str = 'Differentiate', n: Union[int, List[int]] = 1, axis: int = -1, pad: bool = False, pad_args: Dict[str, object] = {})

Bases: pywatts_pipeline.core.transformer.base.BaseTransformer

Differentiation step to calculate the n-th order difference of a time series. By default the difference has not the same size as the input time series but padding is implemented by np.pad and specific arguments can be passed by pad_args.

transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Add n-th order differentiate to xarray dataset.

Parameters:x (xr.DataArray) – Xarray dataset to apply differentiation on.
Returns:Xarray dataset containing the n-th order differentiations.
Return type:xr.DataArray

pywatts.modules.preprocessing.linear_interpolation module

class pywatts.modules.preprocessing.linear_interpolation.LinearInterpolater(name: str = 'LinearInterpolater', method: str = 'linear', dim: str = 'time', fill_value: str = 'extrapolate')

Bases: pywatts_pipeline.core.transformer.base.BaseTransformer

This module creates a linear interpolator.

Parameters:
  • name (str) – Name of the linear interpolator
  • method (str) – The method used for interpolation (e.g. linear)
  • dim (str) – The dimension used
  • fill_value – Handling of missing values (see

https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html) :type fill_value: str

transform(x=<class 'xarray.core.dataarray.DataArray'>) → xarray.core.dataarray.DataArray

Transforms the input

Parameters:x (xr.DataArray) – Input xarray dataset
Returns:Interpolated dataset
Return type:xr.DataArray

pywatts.modules.preprocessing.missing_value_detection module

class pywatts.modules.preprocessing.missing_value_detection.MissingValueDetector(name: str = 'missingValueDetector')

Bases: pywatts_pipeline.core.transformer.base.BaseTransformer

Module to detect missing values (NaN, NA)

transform(dataset: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Detects the indexes that correspond to the input having missing values

Parameters:dataset (xr.DataArray) – Dataset in which missing values should be detected
Returns:Returns a dataset with binary values, true if value is missing and false otherwise
Return type:xr.DataArray

pywatts.modules.preprocessing.resampler module

class pywatts.modules.preprocessing.resampler.Resampler(name: str = 'Resampler', time_index: str = 'time', target_time: str = '1H', method: str = 'mean', method_args: Optional[Dict[str, Any]] = None)

Bases: pywatts_pipeline.core.transformer.base.BaseTransformer

Module to resample time series based data to a given target time (both up and down sampling). All methods given by pandas’ resample method are provided because of xarray’s data set structure. See http://xarray.pydata.org/en/stable/generated/xarray.Dataset.resample.html for more details.

This class resamples time series data based on xarray’s resample method which is in turn based on pandas’ resampling implementation.

Parameters:
  • name (str) – Name of this processing step (default: “Resampler”).
  • time_index (str) – Index of the dataset specifying the time series to be resampled (default: “time”).
  • target_time (str) – Target time after the resampling given in string datetime format (default: “1H”). For example, “6H”/”6h” for 6 hours, “s”/”S” for seconds, “m”/”M” for months.
  • method (str) – Method to use for down- or upsampling the data (default: “mean”). For example, “mean”, “min”, “sum”, “median”, “reduce”, “map”. http://xarray.pydata.org/en/stable/generated/xarray.core.resample.DatasetResample.html
  • method_args (Optional[Dict[str, Any]]) – Optional parameters for the selected method as a dict (default: “None”). Note: Some methods like reduce or map require parameters!

Example

# downsample dataset to 30 Minutes (1800s) by using mean method Resampler(target_time=”1800s”, method=”mean”)

# downsample dataset to 1 day by summing up all data for one day Resampler(target_time=”1d”, method=”sum”)

# upsample “time_series” index of the dataset to 1 Minute by using interpolate method Resampler(time_index=”time_series”, target_time=”60s”, method=”interpolate”)

# resample dataset index to 1 hour by using some costum (in this case first element) Resampler(target_time=”1h”, method=”map”, method_args={“func”: lambda x: x.mean()})

get_min_data()

Returns how much data are at least needed by that transformer

transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Resamples the dataset x as specified in the constructor.

Parameters:x (xr.DataArray) – dataset which should be resampled.
Returns:Resampled xarray dataset as xarray dataset.
Return type:xr.DataArray

pywatts.modules.preprocessing.sampler module

class pywatts.modules.preprocessing.sampler.Sampler(sample_size: int, name: str = 'SampleModule', indexes: List[str] = None)

Bases: pywatts_pipeline.core.transformer.base.BaseTransformer

This module creates samples with a size specified by sample_size. I.e., if sample_size is 24h. It creates for each timestamp a vector containing all values of the past 24 hours. E.g., this module is useful if it forecasting algorithms needs the values of the past 24 hours as input.

Parameters:
  • sample_size (int) – The offset for shifting the time series
  • indexes (List[str]) – The indexes which should be shifted through time
get_min_data()

Returns how much data are at least needed by that transformer

set_params(sample_size: int = None, indexes: List[str] = None)

Set params.

Parameters:
  • sample_size (int) – The offset for shifting the time series
  • indexes (List[str]) – The indexes which should be shifted through time
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Sample the given time series x by the lag.

Parameters:x (xr.DataArray) – the input
Returns:A shifted time series.
Return type:xr.DataArray

pywatts.modules.preprocessing.slicer module

class pywatts.modules.preprocessing.slicer.Slicer(start: Optional[int] = None, end: Optional[int] = None, name: str = 'Slicer')

Bases: pywatts_pipeline.core.transformer.base.BaseTransformer

This module slices the input data array starting from the start index up to the end index. Similar to the numpy array slicing, where we can filter an array with a[start:end].

Parameters:
  • start (int, optional) – Start index of the slicing operation, defaults to None
  • end (int, optional) – End index of the slicing operation, defaults to None
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Perform the slicing operation on the input array.

Parameters:x (xr.DataArray) – Input array which should be sliced.
Returns:Sliced array like in numpy a[start:end].
Return type:xr.DataArray

Module contents