Utils subpackage `melusine.utils`¶

TODO : ADD DESCRIPTION OF THE SUBPACKAGE

List of submodules¶

TransformerScheduler melusine.utils.transformer_scheduler
Multiprocessing melusine.utils.multiprocessing
Streamer melusine.utils.streamer
PrintParts melusine.utils.printer

TransformerScheduler `melusine.utils.transformer_scheduler`¶

Useful class to define its own transformer using specific functions in a specific order to apply along a row of DataFrame (axis=1).

It is compatible with scikit-learn API (i.e. contains fit, transform methods).

class melusine.utils.transformer_scheduler.TransformerScheduler(functions_scheduler, mode='apply', n_jobs=1, progress_bar=True, copy=True, verbose=0)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

This class aims to provide a good way to define its own transformer. It takes a list of function defined in a specific order to apply along a row of DataFrame (axis=1). Transformer returned is compatible with scikit-learn API (i.e. contains fit, transform methods).

Parameters

functions_schedulerlist of tuples, (function, tuple, list): List of function to be applied in a specific order. Each element of the list has to be defined as follow: (function, argument(s) used by the function (optional), colname(s) returned (optional))
modestr {‘apply’, ‘apply_by_multiprocessing’}, optional: Define mode to apply function along a row axis (axis=1). Default value, ‘apply’. If set to ‘apply_by_multiprocessing’, it uses multiprocessing tool to parallelize computation.
n_jobsint, optional: Number of cores used for computation. Default value, 1.
progress_barboolean, optional: Whether to print a progress bar from tqdm package. Default value, True. Works only when mode is set to ‘apply_by_multiprocessing’.
copyboolean, optional: Make a copy of DataFrame. Default value, True.
verboseint, optional: Verosity mode, print loggers. Default value, 0.

Examples

>>> from melusine.utils.transformer_scheduler import TransformerScheduler

>>> MelusineTransformer = TransformerScheduler(
>>>     functions_scheduler=[
>>>         (my_function_1, (argument1, argument2), ['return_col_A']),
>>>         (my_function_2, None, ['return_col_B', 'return_col_C'])
>>>         (my_function_3, (), ['return_col_D'])
>>>     ])

Attributes

function_scheduler, mode, n_jobs, progress_bar

static apply_dict(X_, func_, args_=None, cols_=None, **kwargs)[source]¶

Apply a function on a dictionary.

Parameters

X_dict,: Data on which transformations are applied.
args_list or tuple: List of arguments of the function to apply
cols_list or tuple: List of columns created by the transformation
func_func: Function to apply

Returns

dict

static apply_pandas(X_, func_, args_=None, cols_=None, **kwargs)[source]¶

Apply a function on a pandas DataFrame.

Parameters

X_pandas.DataFrame,: Data on which transformations are applied.
args_list or tuple: List of arguments of the function to apply
cols_list or tuple: List of columns created by the transformation
func_func: Function to apply

Returns

pandas.DataFrame

static apply_pandas_multiprocessing(X_, func_, args_=None, cols_=None, n_jobs=1, progress_bar=False, **kwargs)[source]¶

fit(X, y=None)[source]¶: Unused method. Defined only for compatibility with scikit-learn API.

transform(X)[source]¶

Apply functions defined in the function_scheduler parameter.

Parameters

Xpandas.DataFrame,: Data on which transformations are applied.

Returns

pandas.DataFrame

Multiprocessing `melusine.utils.multiprocessing`¶

melusine.utils.multiprocessing.apply_by_multiprocessing(df, func, **kwargs)[source]¶

Apply a function along an axis of the DataFrame using multiprocessing.

Parameters

dfpd.DataFrame: DataFrame where the function is applied
funcfunction to apply

Returns

pd.DataFrame: Returns the DataFrame with the function applied.

melusine.utils.multiprocessing.apply_df(input_args)[source]¶

Streamer `melusine.utils.streamer`¶

PrintParts `melusine.utils.printer`¶

melusine.utils.printer.print_color(text, part=None)[source]¶: Select according to the tag the right color to use when printing.

melusine.utils.printer.print_color_mail(structured_body)[source]¶

Highlight the tagged sentences.

Parameters

structured_bodya structured body from process_sent_tag,

Returns

Print the mail by sentence.

Utils subpackage melusine.utils¶

List of submodules¶

TransformerScheduler melusine.utils.transformer_scheduler¶

Multiprocessing melusine.utils.multiprocessing¶

Streamer melusine.utils.streamer¶

PrintParts melusine.utils.printer¶

Utils subpackage `melusine.utils`¶

TransformerScheduler `melusine.utils.transformer_scheduler`¶

Multiprocessing `melusine.utils.multiprocessing`¶

Streamer `melusine.utils.streamer`¶

PrintParts `melusine.utils.printer`¶