pywatts.modules.preprocessing package¶
Submodules¶
pywatts.modules.preprocessing.average module¶
-
class
pywatts.modules.preprocessing.average.Average(weights: List[float] = None, name: str = 'Average')¶ Bases:
pywatts.core.base.BaseTransformerAggregation step to average the given time series, ether by simple or weighted averaging. By default simple averaging is applied.
-
get_params() → Dict[str, object]¶ Get parameters for the Average object. :return: Parameters as dict object. :rtype: Dict[str, object]
-
set_params(weights: List[float] = None)¶ Set or change Average object parameters. :param weights: List of individual weights of each given time series for weighted averaging. :type weights: list, optional
-
transform(**kwargs) → xarray.core.dataarray.DataArray¶ Aggregate the given time data_array by simple or weighted averaging. :return: xarray DataArray aggregated by simple or weighted averaging. :rtype: xr.DataArray
-
pywatts.modules.preprocessing.change_direction module¶
-
class
pywatts.modules.preprocessing.change_direction.ChangeDirection(name='change_direction')¶ Bases:
pywatts.core.base.BaseTransformerThis module calculates a time series that indicates whether the next value is higher, lower, or the same.
Parameters: name (str) – The name of the ChangeDirection module -
get_params() → Dict[str, object]¶ Returns a list of all parameters (note that this module has no parameters)
Returns: List of parameters Return type: Dict
-
set_params()¶ Sets the parameters (note that this module has no parameters)
-
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray¶ Transforms the time series in a time series that indicates whether the next value is higher, lower, or the same
Parameters: x (xr.DataArray, optional) – The time series that should be transformed Returns: A time series, where 1 indicates that the next value is higher, -1 that the next value is lower, and 0 that the next value is the same :rtype: xr.DataArray :raises WrongParameterException: If not all indexes are part of x
-
pywatts.modules.preprocessing.clock_shift module¶
-
class
pywatts.modules.preprocessing.clock_shift.ClockShift(lag: int, name: str = 'ClockShift', indexes: List[str] = None)¶ Bases:
pywatts.core.base.BaseTransformerThis module shifts the data by a certain offset.
Parameters: lag – The offset for shifting the time series. Please note: The relative time of the shift is determined by the current temporal resolution of the arrays in the pipeline. :type lag: int :param name: The name of the shift module :type name: str :param indexes: The list of indexes that determine the dimension in which the time should be shifted. If the list is None or empty, the time is shifted in all temporal dimensions. :type indexes: List
-
get_min_data()¶
-
get_params() → Dict[str, object]¶ Returns a list of all defined parameters
Returns: List of parameters Return type: Dict
-
set_params(lag: int = None, indexes: List[str] = None)¶ Sets parameters for clock shifting
Parameters: lag – The offset for shifting the time series. Please note: The relative time of the shift is determined by the current temporal resolution of the arrays in the pipeline. :type lag: int :param indexes: The list of indexes that determine the dimension in which the time should be shifted. If the list is None or empty, the time is shifted in all temporal dimensions. :type indexes: List
-
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray¶ Shifts the given time series x by the defined lag
Parameters: x (xr.DataArray) – the time series to be shifted Returns: The shifted time series Return type: xr.DataArray Raises: WrongParameterException – If not all indexes are part of x
-
pywatts.modules.preprocessing.custom_scaler module¶
-
class
pywatts.modules.preprocessing.custom_scaler.CustomScaler(multiplier: float = 1.0, bias: float = 0.0, name: str = 'CustomScaler')¶ Bases:
pywatts.core.base.BaseTransformerScaling step to scale a time series individually by a multiplier and a bias. By default the scaling does not affect the time series, i.e., the multiplier is 1.0 and the bias 0.0.
-
get_params() → Dict[str, object]¶ Get parameters for the CustomScaler object. :return: Parameters as dict object. :rtype: Dict[str, object]
-
inverse_transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray¶ Apply the inverse scaling to xarray dataset. :param x: xarray DataArray to apply differentiation on. :type x: xr.DataArray :return: Xarray dataset containing the n-th order differentiations. :rtype: xr.DataArray
-
set_params(multiplier: float = None, bias: float = None)¶ Set or change CustomScaler object parameters. :param multiplier: Value that is multiplied to every value in the time series. :type multiplier: float, optional :param bias: Value that is added to every value in the time series :type bias: float, optional
-
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray¶ Apply the scaling to xarray dataset. :param x: xarray DataArray to apply scaling on. :type x: xr.DataArray :return: Xarray dataset scaled according to the specified multiplier and bias. :rtype: xr.DataArray
-
validate_multiplier(multiplier)¶
-
pywatts.modules.preprocessing.differentiate module¶
-
class
pywatts.modules.preprocessing.differentiate.Differentiate(target_index: Union[str, List[str]] = None, name: str = 'Differentiate', n: Union[int, List[int]] = 1, axis: int = -1, pad: bool = False, pad_args: Dict[str, object] = {})¶ Bases:
pywatts.core.base.BaseTransformerDifferentiation step to calculate the n-th order difference of a time series. By default the difference has not the same size as the input time series but padding is implemented by np.pad and specific arguments can be passed by pad_args.
-
get_params() → Dict[str, object]¶ Get parameters for Differentiate object.
Returns: Parameters as dict object. Return type: Dict[str, object]
-
set_params(target_index: Union[str, List[str], None] = None, n: Union[int, List[int], None] = None, axis: Optional[int] = None, pad: Optional[bool] = None, pad_args: Optional[Dict[str, object]] = None)¶ Set or change Differentiate object parameters.
Parameters: - target_index (Optional[Union[str, List[str]]], optional) – Targed index or indizes for the xarray input to calculate difference for.
- n (Optional[Union[int, List[int]]], optional) – N-th order difference specification (default 1). Could also be an array if multiple differences should be calculated.
- axis (Optional[int], optional) – Axis to calculate the difference of (default -1 [last axis]).
- pad (Optional[bool], optional) – Enable or disable padding (default disabled).
- pad_args (Optional[Dict[str, object]], optional) – Padding arguments for np.pad method (default zero leading padding).
-
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray¶ Add n-th order differentiate to xarray dataset.
Parameters: x (xr.DataArray) – Xarray dataset to apply differentiation on. Returns: Xarray dataset containing the n-th order differentiations. Return type: xr.DataArray
-
pywatts.modules.preprocessing.linear_interpolation module¶
-
class
pywatts.modules.preprocessing.linear_interpolation.LinearInterpolater(name: str = 'LinearInterpolater', method: str = 'linear', dim: str = 'time', fill_value: str = 'extrapolate')¶ Bases:
pywatts.core.base.BaseTransformerThis module creates a linear interpolator.
Parameters: - name (str) – Name of the linear interpolator
- method (str) – The method used for interpolation (e.g. linear)
- dim (str) – The dimension used
- fill_value – Handling of missing values (see
https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html) :type fill_value: str
-
get_params() → Dict[str, object]¶ Returns a list of parameters used for the linear interpolation
Returns: Parameters set for the linear interpolation Return type: Dict
-
set_params(method: str = None, dim: str = None, fill_value: str = None)¶ Sets the parameters for the linear interpolation
Parameters: - method (str) – The method used for interpolation (e.g. linear)
- dim (str) – The dimension used
- fill_value – Handling of missing values (see
https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html) :type fill_value: str
-
transform(x=<class 'xarray.core.dataarray.DataArray'>) → xarray.core.dataarray.DataArray¶ Transforms the input
Parameters: x (xr.DataArray) – Input xarray dataset Returns: Interpolated dataset Return type: xr.DataArray
pywatts.modules.preprocessing.missing_value_detection module¶
-
class
pywatts.modules.preprocessing.missing_value_detection.MissingValueDetector(name: str = 'missingValueDetector')¶ Bases:
pywatts.core.base.BaseTransformerModule to detect missing values (NaN, NA)
-
get_params() → Dict[str, object]¶ Get params
-
set_params()¶ Set params
-
transform(dataset: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray¶ Detects the indexes that correspond to the input having missing values
Parameters: dataset (xr.DataArray) – Dataset in which missing values should be detected Returns: Returns a dataset with binary values, true if value is missing and false otherwise Return type: xr.DataArray
-
pywatts.modules.preprocessing.resampler module¶
-
class
pywatts.modules.preprocessing.resampler.Resampler(name: str = 'Resampler', time_index: str = 'time', target_time: str = '1H', method: str = 'mean', method_args: Optional[Dict[str, Any]] = None)¶ Bases:
pywatts.core.base.BaseTransformerModule to resample time series based data to a given target time (both up and down sampling). All methods given by pandas’ resample method are provided because of xarray’s data set structure. See http://xarray.pydata.org/en/stable/generated/xarray.Dataset.resample.html for more details.
This class resamples time series data based on xarray’s resample method which is in turn based on pandas’ resampling implementation.
Parameters: - name (str) – Name of this processing step (default: “Resampler”).
- time_index (str) – Index of the dataset specifying the time series to be resampled (default: “time”).
- target_time (str) – Target time after the resampling given in string datetime format (default: “1H”). For example, “6H”/”6h” for 6 hours, “s”/”S” for seconds, “m”/”M” for months.
- method (str) – Method to use for down- or upsampling the data (default: “mean”). For example, “mean”, “min”, “sum”, “median”, “reduce”, “map”. http://xarray.pydata.org/en/stable/generated/xarray.core.resample.DatasetResample.html
- method_args (Optional[Dict[str, Any]]) – Optional parameters for the selected method as a dict (default: “None”). Note: Some methods like reduce or map require parameters!
Example
# downsample dataset to 30 Minutes (1800s) by using mean method Resampler(target_time=”1800s”, method=”mean”)
# downsample dataset to 1 day by summing up all data for one day Resampler(target_time=”1d”, method=”sum”)
# upsample “time_series” index of the dataset to 1 Minute by using interpolate method Resampler(time_index=”time_series”, target_time=”60s”, method=”interpolate”)
# resample dataset index to 1 hour by using some costum (in this case first element) Resampler(target_time=”1h”, method=”map”, method_args={“func”: lambda x: x.mean()})
-
get_min_data()¶
-
get_params() → Dict[str, object]¶ Get parameter for this object as dict.
Returns: Object parameters as json dict
-
set_params(time_index: Optional[str] = None, target_time: Optional[str] = None, method: Optional[str] = None, method_args: Optional[Dict[str, Any]] = None)¶ Set parameter for this object.
Parameters: - time_index (Optional[str]) – Index of the dataset specifying the time series to resample (default: “time”).
- target_time (Optional[str]) – Target time after the resampling given in string datetime format (default: “1H”). For example, “6H”/”6h” for 6 hours, “s”/”S” for seconds, “m”/”M” for months
- method (Optional[str]) – Method to use for down- or upsampling the data (default: “mean”). For example, “mean”, “min”, “sum”, “meadian”, “reduce”, “map”. http://xarray.pydata.org/en/stable/generated/xarray.core.resample.DatasetResample.html
- method_args (Optional[Dict[str, Any]]) – Optional parameters for the selected method as a dict (default: “None”). Note: Some methods like reduce or map require parameters!
-
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray¶ Resamples the dataset x as specified in the constructor.
Parameters: x (xr.DataArray) – dataset which should be resampled. Returns: Resampled xarray dataset as xarray dataset. Return type: xr.DataArray
pywatts.modules.preprocessing.sampler module¶
-
class
pywatts.modules.preprocessing.sampler.Sampler(sample_size: int, name: str = 'SampleModule', indexes: List[str] = None)¶ Bases:
pywatts.core.base.BaseTransformerThis module creates samples with a size specified by sample_size. I.e., if sample_size is 24h. It creates for each timestamp a vector containing all values of the past 24 hours. E.g., this module is useful if it forecasting algorithms needs the values of the past 24 hours as input.
Parameters: - sample_size (int) – The offset for shifting the time series
- indexes (List[str]) – The indexes which should be shifted through time
-
get_min_data()¶
-
get_params() → Dict[str, object]¶ Returns a list of all hyperparameters/ user defined parameters
Returns: Dict with params Return type: dict
-
set_params(sample_size: int = None, indexes: List[str] = None)¶ Set params.
Parameters: - sample_size (int) – The offset for shifting the time series
- indexes (List[str]) – The indexes which should be shifted through time
-
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray¶ Sample the given time series x by the lag.
Parameters: x (xr.DataArray) – the input Returns: A shifted time series. Return type: xr.DataArray
pywatts.modules.preprocessing.slicer module¶
-
class
pywatts.modules.preprocessing.slicer.Slicer(start: Optional[int] = None, end: Optional[int] = None, name: str = 'Slicer')¶ Bases:
pywatts.core.base.BaseTransformerThis module slices the input data array starting from the start index up to the end index. Similar to the numpy array slicing, where we can filter an array with a[start:end].
Parameters: - start (int, optional) – Start index of the slicing operation, defaults to None
- end (int, optional) – End index of the slicing operation, defaults to None
-
get_params() → Dict[str, object]¶ Return dict of module parameters.
Returns: Dict of module parameters. Return type: Dict[str, object]
-
set_params(start: Optional[int] = None, end: Optional[int] = None)¶ Set module parameters.
Parameters: - start (int, optional) – Start index of the slicing operation, defaults to None
- end (int, optional) – End index of the slicing operation, defaults to None
-
transform(x: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray¶ Perform the slicing operation on the input array.
Parameters: x (xr.DataArray) – Input array which should be sliced. Returns: Sliced array like in numpy a[start:end]. Return type: xr.DataArray