Utils subpackage melusine.utils
¶
TODO : ADD DESCRIPTION OF THE SUBPACKAGE
List of submodules¶
TransformerScheduler melusine.utils.transformer_scheduler
¶
Useful class to define its own transformer using specific functions in a specific order to apply along a row of DataFrame (axis=1).
It is compatible with scikit-learn API (i.e. contains fit, transform methods).
- class melusine.utils.transformer_scheduler.TransformerScheduler(functions_scheduler, mode='apply', n_jobs=1, progress_bar=True, copy=True, verbose=0)[source]¶
Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
This class aims to provide a good way to define its own transformer. It takes a list of function defined in a specific order to apply along a row of DataFrame (axis=1). Transformer returned is compatible with scikit-learn API (i.e. contains fit, transform methods).
- Parameters
- functions_schedulerlist of tuples, (function, tuple, list)
List of function to be applied in a specific order. Each element of the list has to be defined as follow: (function, argument(s) used by the function (optional), colname(s) returned (optional))
- modestr {‘apply’, ‘apply_by_multiprocessing’}, optional
Define mode to apply function along a row axis (axis=1). Default value, ‘apply’. If set to ‘apply_by_multiprocessing’, it uses multiprocessing tool to parallelize computation.
- n_jobsint, optional
Number of cores used for computation. Default value, 1.
- progress_barboolean, optional
Whether to print a progress bar from tqdm package. Default value, True. Works only when mode is set to ‘apply_by_multiprocessing’.
- copyboolean, optional
Make a copy of DataFrame. Default value, True.
- verboseint, optional
Verosity mode, print loggers. Default value, 0.
Examples
>>> from melusine.utils.transformer_scheduler import TransformerScheduler
>>> MelusineTransformer = TransformerScheduler( >>> functions_scheduler=[ >>> (my_function_1, (argument1, argument2), ['return_col_A']), >>> (my_function_2, None, ['return_col_B', 'return_col_C']) >>> (my_function_3, (), ['return_col_D']) >>> ])
- Attributes
- function_scheduler, mode, n_jobs, progress_bar
- static apply_dict(X_, func_, args_=None, cols_=None, **kwargs)[source]¶
Apply a function on a dictionary.
- Parameters
- X_dict,
Data on which transformations are applied.
- args_list or tuple
List of arguments of the function to apply
- cols_list or tuple
List of columns created by the transformation
- func_func
Function to apply
- Returns
- dict
- static apply_pandas(X_, func_, args_=None, cols_=None, **kwargs)[source]¶
Apply a function on a pandas DataFrame.
- Parameters
- X_pandas.DataFrame,
Data on which transformations are applied.
- args_list or tuple
List of arguments of the function to apply
- cols_list or tuple
List of columns created by the transformation
- func_func
Function to apply
- Returns
- pandas.DataFrame
Multiprocessing melusine.utils.multiprocessing
¶
- melusine.utils.multiprocessing.apply_by_multiprocessing(df, func, **kwargs)[source]¶
Apply a function along an axis of the DataFrame using multiprocessing.
- Parameters
- dfpd.DataFrame
DataFrame where the function is applied
- funcfunction to apply
- Returns
- pd.DataFrame
Returns the DataFrame with the function applied.