ScalerAndEncoder

class falcon.tabular.processors.ScalerAndEncoder(mask: List[ColumnTypes])

Applies OneHotEncoder/OrdinalEncoder on low/high cardinality categorical features and StandardScaler on numerical features.

__init__(mask: List[ColumnTypes]) None
Parameters

mask (List[ColumnTypes]) – provides a type for each column at a given index

fit(X: ndarray[Any, dtype[ScalarType]], y: Optional[Any] = None, *args: Any, **kwargs: Any) None

Fits the encoder.

Parameters
  • X (npt.NDArray) – data to encode

  • _ (Any, optional) – dummy argument to keep compatibility with pipeline training

fit_pipe(X: Any, y: Any, *args: Any, **kwargs: Any) Any

Equivalent of fit method that is used for elements chaining inisde pipeline during training.

Parameters
  • X (Any) – features

  • y (Any) – targets

Returns

usually None

Return type

Any

forward(X: ndarray[Any, dtype[object_]], *args: Any, **kwargs: Any) ndarray[Any, dtype[ScalarType]]

Equivalent of .predict() or .transform().

Parameters

X (npt.NDArray[object]) – data to process

Returns

processed data

Return type

npt.NDArray

get_input_type() Type
Returns

object

Return type

Type

get_output_type() Type
Returns

Float32Array

Return type

Type

predict(X: ndarray[Any, dtype[ScalarType]], *args: Any, **kwargs: Any) ndarray[Any, dtype[ScalarType]]

Applies the encoder.

Parameters

X (npt.NDArray) – input data

Returns

encoded data

Return type

npt.NDArray

to_onnx() SerializedModelRepr

Serializes the encoder to onnx. Each feature in the original dataset is mapped to its own input node (float32 for numerical or string for categorical).

Return type

SerializedModelRepr

transform(X: ndarray[Any, dtype[ScalarType]], *args: Any, **kwargs: Any) ndarray[Any, dtype[ScalarType]]

Equivalent of self.predict(X)

Parameters

X (npt.NDArray) – features

Returns

transformed features

Return type

npt.NDArray