SimpleTabularPipeline

class falcon.tabular.pipelines.SimpleTabularPipeline(task: str, mask: ~typing.List[~falcon.types.ColumnTypes], learner: ~typing.Type[~falcon.abstract.learner.Learner] = <class 'falcon.tabular.learners.super_learner.SuperLearner'>, learner_kwargs: ~typing.Optional[~typing.Dict] = None, preprocessor: str = 'MultiModalEncoder')

Default tabular pipeline.

__init__(task: str, mask: ~typing.List[~falcon.types.ColumnTypes], learner: ~typing.Type[~falcon.abstract.learner.Learner] = <class 'falcon.tabular.learners.super_learner.SuperLearner'>, learner_kwargs: ~typing.Optional[~typing.Dict] = None, preprocessor: str = 'MultiModalEncoder')

Default tabular pipeline. On a high level it simply chains a preprocessor and model learner (by default SuperLearner). For classification tasks, the labels are also encoded as integers (while predictions are decoded back to strings). Internally, all numerical features are scaled to 0 mean and 1 std. All categorical features are one-hot encoded (this approach might not be suitable for features with very high cardinality).

Parameters

task (str) – tabular_classification or tabular_regression
mask (List[int]) – list of ints where 1/2 indicates a low/high cardinality categorical feature and 0 indicates a numerical feature
learner (Learner, optional) – learner class to be used, by default SuperLearner
learner_kwargs (Optional[Dict], optional) – arguments to be passed to the learner, by default None
preprocessor (str) – defines which preprocessor to use, can be one of {‘MultiModalEncoder’,’ScalerAndEncoder’}, by default ‘MultiModalEncoder’

add_element(element: PipelineElement) → None

Adds element to pipeline. The input type of added element should match the output type of the last element in the pipeline.

Parameters: element (PipelineElement) – element to be added to the end of the pipeline

fit(X: ndarray[Any, dtype[ScalarType]], y: ndarray[Any, dtype[ScalarType]], *args: Any, **kwargs: Any) → None

Fits the pipeline by consecutively calling .fit_pipe() method of each element in pipeline. For tabular classification, LabelDecoder is applied to targets before actual training occurs.

Parameters

X (npt.NDArray) – train featrues
y (npt.NDArray) – train targets

predict(X: ndarray[Any, dtype[ScalarType]], *args: Any, **kwargs: Any) → ndarray[Any, dtype[ScalarType]]

Predicts the label of passed data points.

Parameters: X (npt.NDArray) – features
Returns: predicted label
Return type: npt.NDArray

save() → ModelProto

Exports the pipeline to ONNX ModelProto

Returns: Pipeline as ONNX ModelProto
Return type: ModelProto