TabularTaskManager
- class falcon.tabular.TabularTaskManager(task: str, data: Union[str, ndarray[Any, dtype[ScalarType]], DataFrame, Tuple], pipeline: Optional[Type[Pipeline]] = None, pipeline_options: Optional[Dict] = None, extra_pipeline_options: Optional[Dict] = None, features: Optional[Union[List[str], List[int]]] = None, target: Optional[Union[str, int]] = None, **options: Any)
 Default task manager for tabular data.
- __init__(task: str, data: Union[str, ndarray[Any, dtype[ScalarType]], DataFrame, Tuple], pipeline: Optional[Type[Pipeline]] = None, pipeline_options: Optional[Dict] = None, extra_pipeline_options: Optional[Dict] = None, features: Optional[Union[List[str], List[int]]] = None, target: Optional[Union[str, int]] = None, **options: Any) None
 - Parameters
 task (str) – tabular_classification or tabular_regression
data (Union[str, npt.NDArray, pd.DataFrame, Tuple]) – path to data file or pandas dataframe or numpy array or tuple (X,y)
pipeline (Optional[Type[Pipeline]]) – class to be used as pipeline, by default None. If None, SimpleTabularPipeline will be used
pipeline_options (Optional[Dict], optional) – arguments to be passed to the pipeline, by default None. These options will overwrite the ones from default_pipeline_options attribute.
extra_pipeline_options (Optional[Dict], optional) – arguments to be passed to the pipeline, by default None. These options will be passed in addition to the ones from default_pipeline_options attribute. This argument is ignored if pipeline_options is not None
features (Optional[ft.ColumnsList], optional) – names or indices of columns to be used as features, by default None. If None, all columns except the last one will be used. If target argument is not None, features should be passed explicitly as well
target (Optional[Union[str, int]], optional) – name or index of column to be used as target, by default None. If None, the last column will be used as target. If features argument is not None, target should be specified explicitly as well
- _create_pipeline(pipeline: Optional[Type[Pipeline]], options: Optional[Dict]) None
 Initializes the pipeline.
- Parameters
 pipeline (Optional[Type[Pipeline]]) – pipeline class
options (Optional[Dict]) – pipeline options
- _prepare_data(data: Union[str, ndarray[Any, dtype[ScalarType]], DataFrame, Tuple], training: bool = True) Tuple[ndarray[Any, dtype[ScalarType]], ndarray[Any, dtype[ScalarType]], List[ColumnTypes]]
 Initial data preparation: 1) optional: read data from the specified location; 2) split into features and targets. By default it is assumed that the last column is the target; 3) clean data; 4) determine numerical and categorical features (create categorical mask).
- Parameters
 data (Union[str, npt.NDArray, pd.DataFrame, Tuple]) – path to data file or pandas dataframe or numpy array or Tuple(X,y)
- Returns
 tuple of features, target and type mask for features
- Return type
 Tuple[npt.NDArray, npt.NDArray, List[ColumnTypes]]
- property default_pipeline_options: Dict
 Default options for pipeline.
- evaluate(test_data: Union[str, ndarray[Any, dtype[ScalarType]], DataFrame, Tuple], silent: bool = False) Dict
 Perfoms and prints the evaluation report on the given dataset.
- Parameters
 test_data (Union[str, npt.NDArray, pd.DataFrame, Tuple]) – dataset to be used for evaluation
silent (bool) – controls whether the metrics are printed on screen, by default False
- performance_summary(test_data: Optional[Union[str, ndarray[Any, dtype[ScalarType]], DataFrame, Tuple]]) dict
 Prints a performance summary of the model. The summary always includes metrics calculated for the train set. If the train/eval split was done during training, the summary includes metrics calculated on eval set. If test set is provided as an argument, the performance includes metrics calculated on test set.
- Parameters
 test_data (Optional[Union[str, npt.NDArray, pd.DataFrame, Tuple]]) – data to be used as test set, by default None
- Returns
 metrics for each subset
- Return type
 dict
- predict(data: Union[str, ndarray[Any, dtype[ScalarType]], DataFrame]) ndarray[Any, dtype[ScalarType]]
 Performs prediction on new data.
- Parameters
 data (Union[str, npt.NDArray, pd.DataFrame]) – path to data file or pandas dataframe or numpy array
- Returns
 predictions
- Return type
 npt.NDArray
- predict_stored_subset(subset: str = 'train') ndarray[Any, dtype[ScalarType]]
 Makes a prediction on a stored subset (train or eval).
- Parameters
 subset (str, optional) – subset to predict on (train or eval), by default ‘train’
- Returns
 predicted values
- Return type
 npt.NDArray
- save_model(filename: Optional[str] = None, **kwargs: Any) ModelProto
 Serializes and saves the model.
- Parameters
 filename (Optional[str], optional) – filename for the model file, by default None. If filename is not specified, the model is not saved on disk and only returned as bytes object
- Returns
 ONNX ModelProto of the model
- Return type
 ModelProto
- train(make_eval_subset: bool = True, pre_eval: bool = False, **kwargs: Any) TabularTaskManager
 Invokes the training procedure of an underlying pipeline. Print an expected model performance if available.
- Parameters
 pre_eval (bool) – if True, first estimate model perfromance via 10 folds CV for small datasets or 25% test split for large datasets, by default False. Setting pre_eval = True is not reccomended as it pre-evaluates the pipeline as a whole which has lots of random elements therefore the results might be non reproducable
make_eval_subset (bool) – controls whether a dedicated eval set should be allocated for performance report, by default True. If True, overwrites the value of pre_eval to False
- Returns
 self
- Return type