SimpleTabularPipeline

class falcon.tabular.pipelines.SimpleTabularPipeline(task: str, mask: ~typing.List[~falcon.types.ColumnTypes], learner: ~typing.Type[~falcon.abstract.learner.Learner] = <class 'falcon.tabular.learners.super_learner.SuperLearner'>, learner_kwargs: ~typing.Optional[~typing.Dict] = None, preprocessor: str = 'MultiModalEncoder')

Default tabular pipeline.

__init__(task: str, mask: ~typing.List[~falcon.types.ColumnTypes], learner: ~typing.Type[~falcon.abstract.learner.Learner] = <class 'falcon.tabular.learners.super_learner.SuperLearner'>, learner_kwargs: ~typing.Optional[~typing.Dict] = None, preprocessor: str = 'MultiModalEncoder')

Default tabular pipeline. On a high level it simply chains a preprocessor and model learner (by default SuperLearner). For classification tasks, the labels are also encoded as integers (while predictions are decoded back to strings). Internally, all numerical features are scaled to 0 mean and 1 std. All categorical features are one-hot encoded (this approach might not be suitable for features with very high cardinality).

Parameters
  • task (str) – tabular_classification or tabular_regression

  • mask (List[int]) – list of ints where 1/2 indicates a low/high cardinality categorical feature and 0 indicates a numerical feature

  • learner (Learner, optional) – learner class to be used, by default SuperLearner

  • learner_kwargs (Optional[Dict], optional) – arguments to be passed to the learner, by default None

  • preprocessor (str) – defines which preprocessor to use, can be one of {‘MultiModalEncoder’,’ScalerAndEncoder’}, by default ‘MultiModalEncoder’

add_element(element: PipelineElement) None

Adds element to pipeline. The input type of added element should match the output type of the last element in the pipeline.

Parameters

element (PipelineElement) – element to be added to the end of the pipeline

fit(X: ndarray[Any, dtype[ScalarType]], y: ndarray[Any, dtype[ScalarType]], *args: Any, **kwargs: Any) None

Fits the pipeline by consecutively calling .fit_pipe() method of each element in pipeline. For tabular classification, LabelDecoder is applied to targets before actual training occurs.

Parameters
  • X (npt.NDArray) – train featrues

  • y (npt.NDArray) – train targets

predict(X: ndarray[Any, dtype[ScalarType]], *args: Any, **kwargs: Any) ndarray[Any, dtype[ScalarType]]

Predicts the label of passed data points.

Parameters

X (npt.NDArray) – features

Returns

predicted label

Return type

npt.NDArray

save() ModelProto

Exports the pipeline to ONNX ModelProto

Returns

Pipeline as ONNX ModelProto

Return type

ModelProto